summaryrefslogtreecommitdiff
path: root/drivers/nvme
AgeCommit message (Collapse)AuthorFilesLines
2022-09-27nvmet-tcp: handle ICReq PDU received in NVMET_TCP_Q_LIVE stateVarun Prakash1-0/+7
As per NVMe/TCP transport specification ICReq PDU is the first PDU received by the controller and controller should receive only one ICReq PDU. If controller receives more than one ICReq PDU then this can be considered as fatal error. nvmet-tcp driver does not check for ICReq PDU opcode if queue state is NVMET_TCP_Q_LIVE. In LIVE state ICReq PDU is treated as CapsuleCmd PDU, this can result in abnormal behavior. Add a check for ICReq PDU in nvmet_tcp_done_recv_pdu() to fix this issue. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvmet-tcp: fix NULL pointer dereference during releasezhenwei pi1-3/+16
nvmet-tcp frees CMD buffers in nvmet_tcp_uninit_data_in_cmds(), and waits the inflight IO requests in nvmet_sq_destroy(). During wait the inflight IO requests, the callback nvmet_tcp_queue_response() is called from backend after IO complete, this leads a typical Use-After-Free issue like this: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 107f80067 P4D 107f80067 PUD 10789e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 123 Comm: kworker/1:1H Kdump: loaded Tainted: G E 6.0.0-rc2.bm.1-amd64 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp] RIP: 0010:shash_ahash_digest+0x2b/0x110 Code: 1f 44 00 00 41 57 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 44 8b 67 30 45 85 e4 74 1c 48 8b 57 38 b8 00 10 00 00 <44> 8b 7a 08 44 29 f8 39 42 0c 0f 46 42 0c 41 39 c4 76 43 48 8b 03 RSP: 0018:ffffc9000051bdd8 EFLAGS: 00010206 RAX: 0000000000001000 RBX: ffff888100ab5470 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888100ab5470 RDI: ffff888100ab5420 RBP: ffff888100ab5420 R08: ffff8881024d08c8 R09: ffff888103e1b4b8 R10: 8080808080808080 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000000000 R14: ffff88813412bd4c R15: ffff8881024d0800 FS: 0000000000000000(0000) GS:ffff88883fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000104b48000 CR4: 0000000000350ee0 Call Trace: <TASK> nvmet_tcp_io_work+0xa52/0xb52 [nvmet_tcp] ? __switch_to+0x106/0x420 process_one_work+0x1ae/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 Separate nvmet_tcp_uninit_data_in_cmds() into two steps: uninit data in cmds <- new step 1 nvmet_sq_destroy(); cancel_work_sync(&queue->io_work); free CMD buffers <- new step 2 Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme-pci: report the actual number of tagset mapsKeith Busch1-2/+4
We've been reporting 2 maps regardless of whether the module parameter asked for anything beyond the default queues. A consequence of this means that blk-mq will reinitialize the all the hardware contexts and io schedulers on every controller reset when the mapping is exactly the same as before. This unnecessary overhead is adding several milliseconds on a reset for environments that don't need it. Report the actual number of mappings in use. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme-pci: set min_align_mask before calculating max_hw_sectorsRishabh Bhatnagar1-1/+2
If swiotlb is force enabled dma_max_mapping_size ends up calling swiotlb_max_mapping_size which takes into account the min align mask for the device. Set the min align mask for nvme driver before calling dma_max_mapping_size while calculating max hw sectors. Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: send a rediscover uevent when a persistent discovery controller reconnectsSagi Grimberg2-0/+11
When a discovery controller is disconnected, no AENs will arrive to notify the host about discovery log change events. In order to solve this, send a uevent notification when a persistent discovery controller reconnects. We add a new ctrl flag NVME_CTRL_STARTED_ONCE that will be set on the first start, and consecutive calls will find it set, and send the event to userspace if the controller is a discovery controller. Upon the event reception, userspace will re-read the discovery log page and will act upon changes as it sees fit. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: enumerate controller flagsSagi Grimberg1-2/+5
We expect to grow a few of these flags for various purposes so make them a proper enumeration. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme-pci: disable Write Zeroes on Phison E3C/E4CTina Hsu1-0/+4
E3C/E4C SSDs do support the Write Zeroes command in theory, but have very bad performance when using it. As the firmware has been frozen for these products we can not expect firmware improvements for it, so disable Write Zeroes. Signed-off-by: Tina Hsu <tina_hsu@phison.corp-partner.google.com> [hch: update the commit message] Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devicesMichael Kelley1-3/+3
The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are non-functional on NVMe devices because the nvme_pr_clear() and nvme_pr_release() functions set the IEKEY field incorrectly. The IEKEY field should be set only when the key is zero (i.e, not specified). The current code does it backwards. Furthermore, the NVMe spec describes the persistent reservation "clear" function as an option on the reservation release command. The current implementation of nvme_pr_clear() erroneously uses the reservation register command. Fix these errors. Note that NVMe version 1.3 and later specify that setting the IEKEY field will return an error of Invalid Field in Command. The fix will set IEKEY when the key is zero, which is appropriate as these ioctls consider a zero key to be "unspecified", and the intention of the spec change is to require a valid key. Tested on a version 1.4 PCI NVMe device in an Azure VM. Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Fixes: 1d277a637a71 ("NVMe: Add persistent reservation ops") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: ensure subsystem reset is single threadedKeith Busch1-3/+13
The subsystem reset writes to a register, so we have to ensure the device state is capable of handling that otherwise the driver may access unmapped registers. Use the state machine to ensure the subsystem reset doesn't try to write registers on a device already undergoing this type of reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: restrict management ioctls to adminKeith Busch1-0/+6
The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: copy firmware_rev on each initKeith Busch1-1/+2
The firmware revision can change on after a reset so copy the most recent info each time instead of just the first time, otherwise the sysfs firmware_rev entry may contain stale data. Reported-by: Jeff Lien <jeff.lien@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: handle effects after freeing the requestKeith Busch4-14/+23
If a reset occurs after the scan work attempts to issue a command, the reset may quisce the admin queue, which blocks the scan work's command from dispatching. The scan work will not be able to complete while the queue is quiesced. Meanwhile, the reset work will cancel all outstanding admin tags and wait until all requests have transitioned to idle, which includes the passthrough request. But the passthrough request won't be set to idle until after the scan_work flushes, so we're deadlocked. Fix this by handling the end effects after the request has been freed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354 Reported-by: Jonathan Derrick <Jonathan.Derrick@solidigm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-21fs: add batch and poll flags to the uring_cmd_iopoll() handlerJens Axboe2-6/+12
We need the poll_flags to know how to poll for the IO, and we should have the batch structure in preparation for supporting batched completions with iopoll. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21nvme: wire up async polling for io passthrough commandsKanchan Joshi4-5/+72
Store a cookie during submission, and use that to implement completion-polling inside the ->uring_cmd_iopoll handler. This handler makes use of existing bio poll facility. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-5-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-19nvme-tcp: print actual source IP address through sysfs "address" attrMartin Belanger1-1/+20
TCP transport relies on the routing table to determine which source address and interface to use when making a connection. Currently, there is no way to tell from userspace where a connection was made. This patch exposes the actual source address using a new field named "src_addr=" in the "address" attribute. This is needed to diagnose and identify connectivity issues. With the source address we can infer the interface associated with each connection. This was tested with nvme-cli 2.0 to verify it does not have any adverse effect. The new "src_addr=" field will simply be displayed in the output of the "list-subsys" or "list -v" commands as shown here. $ nvme list-subsys nvme-subsys0 - NQN=nqn.2014-08.org.nvmexpress.discovery \ +- nvme0 tcp traddr=192.168.56.1,trsvcid=8009,src_addr=192.168.56.101 live Signed-off-by: Martin Belanger <martin.belanger@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvmet-tcp: don't map pages which can't come from HIGHMEMFabio M. De Francesco1-31/+13
kmap() is being deprecated in favor of kmap_local_page().[1] There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. The pages which will be mapped are allocated in nvmet_tcp_map_data(), using the GFP_KERNEL flag. This assures that they cannot come from HIGHMEM. This imply that a straight page_address() can replace the kmap() of sg_page(sg) in nvmet_tcp_map_pdu_iovec(). As a side effect, we might also delete the field "nr_mapped" from struct "nvmet_tcp_cmd" because, after removing the kmap() calls, there would be no longer any need of it. In addition, there is no reason to use a kvec for the command receive data buffers iovec, use a bio_vec instead and let iov_iter handle the buffer mapping and data copy. Test with blktests on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. [1] "[PATCH] checkpatch: Add kmap and kmap_atomic to the deprecated list" https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com/ Cc: Chaitanya Kulkarni <chaitanyak@nvidia.com> Cc: Keith Busch <kbusch@kernel.org> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> [sagi: added bio_vec plus minor naming changes] Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-pci: move iod dma_len fill gapsKeith Busch1-1/+1
The 32-bit field, dma_len, packs better in the iod struct above the dma_addr_t on 64-bit systems. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-pci: iod npages fits in s8Keith Busch1-13/+16
The largest allowed transfer is 4MB, which can use at most 1025 PRPs. Each PRP is 8 bytes, so the maximum number of 4k nvme pages needed for the iod_list is 3, which fits in an 's8' type. While modifying this field, change the name to "nr_allocations" to better represent that this is referring to the number of units allocated from a dma_pool. Also introduce a BUILD_BUG_ON to ensure we never accidently increase the largest transfer limit beyond 127 chained prp lists. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-pci: iod's 'aborted' is a boolKeith Busch1-3/+3
It's only true or false, so make this a bool to reflect that and save some space in nvme_iod. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-pci: remove nvme_queue from nvme_iodKeith Busch1-15/+13
We can get the nvme_queue from the req just as easily, so remove the duplicate path to the same structure to save some space. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme: consider also host_iface when checking ip optionsDaniel Wagner1-5/+18
It's perfectly fine to use the same traddr and trsvcid more than once as long we use different host interface. This is used in setups where the host has more than one interface but the target exposes only one traddr/trsvcid combination. Use the same acceptance rules for host_iface as we have for host_traddr. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-rdma: handle number of queue changesDaniel Wagner1-5/+21
On reconnect, the number of queues might have changed. In the case where we have more queues available than previously we try to access queues which are not initialized yet. The other case where we have less queues than previously, the connection attempt will fail because the target doesn't support the old number of queues and we end up in a reconnect loop. Thus, only start queues which are currently present in the tagset limited by the number of available queues. Then we update the tagset and we can start any new queue. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-tcp: handle number of queue changesDaniel Wagner1-5/+21
On reconnect, the number of queues might have changed. In the case where we have more queues available than previously we try to access queues which are not initialized yet. The other case where we have less queues than previously, the connection attempt will fail because the target doesn't support the old number of queues and we end up in a reconnect loop. Thus, only start queues which are currently present in the tagset limited by the number of available queues. Then we update the tagset and we can start any new queue. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvmet: expose max queues to configfsDaniel Wagner1-0/+29
Allow to set the max queues the target supports. This is useful for testing the reconnect attempt of the host with changing numbers of supported queues. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvmet: avoid unnecessary flush bioGuixin Liu1-0/+8
For no volatile write cache block device backend, sending flush bio is unnecessary, avoid to do that. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvmet-auth: remove redundant parameters reqGenjian Zhang1-2/+2
The parameter is not used in this function, so remove it. Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvmet-auth: clean up with done_kfreeJackie Liu1-4/+2
Jump directly to done_kfree to release d, which is consistent with the code style behind. Reported-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme-auth: remove the redundant req->cqe->result.u16 assignment operationJackie Liu1-1/+0
req->cqe->result.u16 has already been assigned in the previous line, no need to do it again. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme: move from strlcpy with unused retval to strscpyWolfram Sang4-4/+4
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-19nvme: add comment for unaligned "fake" nqnLinjun Bao1-1/+5
Current "fake" nqn field is "nqn.2014.08.org.nvmexpress:", it is not aligned with the canonical version for history reasons. Signed-off-by: Linjun Bao <meljbao@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-09Merge tag 'block-6.0-2022-09-09' of git://git.kernel.dk/linux-blockLinus Torvalds4-12/+32
Pull block fixes from Jens Axboe: - NVMe pull via Christoph: - fix a use after free in nvmet (Bart Van Assche) - fix a use after free when detecting digest errors (Sagi Grimberg) - fix regression that causes sporadic TCP requests to time out (Sagi Grimberg) - fix two off by ones errors in the nvmet ZNS support (Dennis Maisenbacher) - requeue aen after firmware activation (Keith Busch) - Fix missing request flags in debugfs code (me) - Partition scan fix (Ming) * tag 'block-6.0-2022-09-09' of git://git.kernel.dk/linux-block: block: add missing request flags to debugfs code nvme: requeue aen after firmware activation nvmet: fix mar and mor off-by-one errors nvme-tcp: fix regression that causes sporadic requests to time out nvme-tcp: fix UAF when detecting digest errors nvmet: fix a use-after-free block: don't add partitions if GD_SUPPRESS_PART_SCAN is set
2022-09-07nvme: requeue aen after firmware activationKeith Busch1-3/+11
The driver prevents async event work while handling a processing paused event, but someone needs to restart it after the controller returns to a live state. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216400 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-07nvmet: fix mar and mor off-by-one errorsDennis Maisenbacher1-2/+15
Maximum Active Resources (MAR) and Maximum Open Resources (MOR) are 0's based vales where a value of 0xffffffff indicates that there is no limit. Decrement the values that are returned by bdev_max_open_zones and bdev_max_active_zones as the block layer helpers are not 0's based. A 0 returned by the block layer helpers indicates no limit, thus convert it to 0xffffffff (U32_MAX). Fixes: aaf2e048af27 ("nvmet: add ZBD over ZNS backend support") Suggested-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-06nvme-tcp: fix regression that causes sporadic requests to time outSagi Grimberg1-4/+1
When we queue requests, we strive to batch as much as possible and also signal the network stack that more data is about to be sent over a socket with MSG_SENDPAGE_NOTLAST. This flag looks at the pending requests queued as well as queue->more_requests that is derived from the block layer last-in-batch indication. We set more_request=true when we flush the request directly from .queue_rq submission context (in nvme_tcp_send_all), however this is wrongly assuming that no other requests may be queued during the execution of nvme_tcp_send_all. Due to this, a race condition may happen where: 1. request X is queued as !last-in-batch 2. request X submission context calls nvme_tcp_send_all directly 3. nvme_tcp_send_all is preempted and schedules to a different cpu 4. request Y is queued as last-in-batch 5. nvme_tcp_send_all context sends request X+Y, however signals for both MSG_SENDPAGE_NOTLAST because queue->more_requests=true. ==> none of the requests is pushed down to the wire as the network stack is waiting for more data, both requests timeout. To fix this, we eliminate queue->more_requests and only rely on the queue req_list and send_list to be not-empty. Fixes: 122e5b9f3d37 ("nvme-tcp: optimize network stack with setting msg flags according to batch size") Reported-by: Jonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Jonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-06nvme-tcp: fix UAF when detecting digest errorsSagi Grimberg1-1/+1
We should also bail from the io_work loop when we set rd_enabled to true, so we don't attempt to read data from the socket when the TCP stream is already out-of-sync or corrupted. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-05nvmet: fix a use-after-freeBart Van Assche1-2/+4
Fix the following use-after-free complaint triggered by blktests nvme/004: BUG: KASAN: user-memory-access in blk_mq_complete_request_remote+0xac/0x350 Read of size 4 at addr 0000607bd1835943 by task kworker/13:1/460 Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e print_report.cold+0x36/0x1e2 kasan_report+0xb9/0xf0 __asan_load4+0x6b/0x80 blk_mq_complete_request_remote+0xac/0x350 nvme_loop_queue_response+0x1df/0x275 [nvme_loop] __nvmet_req_complete+0x132/0x4f0 [nvmet] nvmet_req_complete+0x15/0x40 [nvmet] nvmet_execute_io_connect+0x18a/0x1f0 [nvmet] nvme_loop_execute_work+0x20/0x30 [nvme_loop] process_one_work+0x56e/0xa70 worker_thread+0x2d1/0x640 kthread+0x183/0x1c0 ret_from_fork+0x1f/0x30 Cc: stable@vger.kernel.org Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-03Merge tag 'block-6.0-2022-09-02' of git://git.kernel.dk/linux-blockLinus Torvalds3-0/+6
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - error handling fix for the new auth code (Hannes Reinecke) - fix unhandled tcp states in nvmet_tcp_state_change (Maurizio Lombardi) - add NVME_QUIRK_BOGUS_NID for Lexar NM610 (Shyamin Ayesh) - Add documentation for the ublk driver merged in this merge window (Ming) * tag 'block-6.0-2022-09-02' of git://git.kernel.dk/linux-block: Documentation: document ublk nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change() nvmet-auth: add missing goto in nvmet_setup_auth() nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610
2022-08-31nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change()Maurizio Lombardi1-0/+3
TCP_FIN_WAIT2 and TCP_LAST_ACK were not handled, the connection is closing so we can ignore them and avoid printing the "unhandled state" warning message. [ 1298.852386] nvmet_tcp: queue 2 unhandled state 5 [ 1298.879112] nvmet_tcp: queue 7 unhandled state 5 [ 1298.884253] nvmet_tcp: queue 8 unhandled state 5 [ 1298.889475] nvmet_tcp: queue 9 unhandled state 5 v2: Do not call nvmet_tcp_schedule_release_queue(), just ignore the fin_wait2 and last_ack states. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-31nvmet-auth: add missing goto in nvmet_setup_auth()Hannes Reinecke1-0/+1
There's a goto missing in nvmet_setup_auth(), causing a kernel oops when nvme_auth_extract_key() fails. Reported-by: Tal Lossos <tallossos@gmail.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-31nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610Shyamin Ayesh1-0/+2
Lexar NM610 reports bogus eui64 values that appear to be the same across all drives. Quirk them out so they are not marked as "non globally unique" duplicates. Signed-off-by: Shyamin Ayesh <me@shyamin.com> [patch formatting] Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-22block: Change the return type of blk_mq_map_queues() into voidBart Van Assche4-11/+4
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-13Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-blockLinus Torvalds5-4/+18
Pull block fixes from Jens Axboe: - NVMe pull request - print nvme connect Linux error codes properly (Amit Engel) - fix the fc_appid_store return value (Christoph Hellwig) - fix a typo in an error message (Christophe JAILLET) - add another non-unique identifier quirk (Dennis P. Kliem) - check if the queue is allocated before stopping it in nvme-tcp (Maurizio Lombardi) - restart admin queue if the caller needs to restart queue in nvme-fc (Ming Lei) - use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang Xiaoxu) - __alloc_disk_node() error handling fix (Rafael) * tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block: block: Do not call blk_put_queue() if gendisk allocation fails nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70 nvme-tcp: check if the queue is allocated before stopping it nvme-fabrics: Fix a typo in an error message nvme-fabrics: parse nvme connect Linux error codes nvmet-auth: use kmemdup instead of kmalloc + memcpy nvme-fc: fix the fc_appid_store return value nvme-fc: restart admin queue if the caller needs to restart queue
2022-08-11nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70Dennis P. Kliem1-0/+2
ADATA XPG GAMMIX S70 reports bogus eui64 values that appear to be the same across all drives. Quirk them out so they are not marked as "non globally unique" duplicates. Signed-off-by: Dennis P. Kliem <dpkliem@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-10nvme-tcp: check if the queue is allocated before stopping itMaurizio Lombardi1-0/+3
When an error is detected and the host reconnects, the nvme_tcp_error_recovery_work() function is called and starts tearing down the io queues and de-allocating them; If at the same time the "nvme" process deletes the controller via sysfs, the nvme_tcp_delete_ctrl() gets called and waits until the nvme_tcp_error_recovery_work() finishes its job; then starts tearing down the io queues, but at this point they have already been freed and the mutexes are destroyed. Calling mutex_lock() against a destroyed mutex triggers a warning: [ 1299.025575] nvme nvme1: Reconnecting in 10 seconds... [ 1299.636449] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1" [ 1299.645262] ------------[ cut here ]------------ [ 1299.649949] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 1299.649971] WARNING: CPU: 4 PID: 104150 at kernel/locking/mutex.c:579 __mutex_lock+0x2d0/0x7dc [ 1299.717934] CPU: 4 PID: 104150 Comm: nvme [ 1299.828075] Call trace: [ 1299.830526] __mutex_lock+0x2d0/0x7dc [ 1299.834203] mutex_lock_nested+0x64/0xd4 [ 1299.838139] nvme_tcp_stop_queue+0x54/0xe0 [nvme_tcp] [ 1299.843211] nvme_tcp_teardown_io_queues.part.0+0x90/0x280 [nvme_tcp] [ 1299.849672] nvme_tcp_delete_ctrl+0x6c/0xf0 [nvme_tcp] [ 1299.854831] nvme_do_delete_ctrl+0x108/0x120 [nvme_core] [ 1299.860181] nvme_sysfs_delete+0xec/0xf0 [nvme_core] [ 1299.865179] dev_attr_store+0x40/0x70 Fix the warning by checking if the queues are allocated in the nvme_tcp_stop_queue(). If they are not, it makes no sense to try to stop them. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-10nvme-fabrics: Fix a typo in an error messageChristophe JAILLET1-1/+1
A 'c' is missing. s/fabris/fabrics/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-10nvme-fabrics: parse nvme connect Linux error codesAmit Engel1-0/+6
This fixes the assumption that errval is an unsigned nvme error Signed-off-by: Amit Engel <amit.engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-10nvmet-auth: use kmemdup instead of kmalloc + memcpyZhang Xiaoxu1-2/+2
For code neat purpose, we can use kmemdup to replace kmalloc + memcpy. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-10nvme-fc: fix the fc_appid_store return valueChristoph Hellwig1-1/+2
"nvme-fc: fold t fc_update_appid into fc_appid_store" accidentally changed the userspace interface for the appid attribute, because the code that decrements "count" to remove a trailing '\n' in the parsing results in the decremented value being incorrectly be returned from the sysfs write. Fix this by keeping an orig_count variable for the full length of the write. Fixes: c814153c83a8 ("nvme-fc: fold t fc_update_appid into fc_appid_store") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Tested-by: Muneendra Kumar M <muneendra.kumar@broadcom.com>
2022-08-10nvme-fc: restart admin queue if the caller needs to restart queueMing Lei1-0/+2
Without restarting admin queue in __nvme_fc_abort_outstanding_ios(), it leaves controller not capable of handling admin pt request, and causes io hang. Fixes it by restarting admin queue if the caller of __nvme_fc_abort_outstanding_ios requires to restart queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com> Tested-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-06Merge tag 'dma-mapping-5.20-2022-08-06' of ↵Linus Torvalds4-46/+42
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy, Christoph Hellwig) - restructure the PCIe peer to peer mapping support (Logan Gunthorpe) - allow the IOMMU code to communicate an optional DMA mapping length and use that in scsi and libata (John Garry) - split the global swiotlb lock (Tianyu Lan) - various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang, Lukas Bulwahn, Robin Murphy) * tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits) swiotlb: fix passing local variable to debugfs_create_ulong() dma-mapping: reformat comment to suppress htmldoc warning PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() RDMA/rw: drop pci_p2pdma_[un]map_sg() RDMA/core: introduce ib_dma_pci_p2p_dma_supported() nvme-pci: convert to using dma_map_sgtable() nvme-pci: check DMA ops when indicating support for PCI P2PDMA iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg iommu: Explicitly skip bus address marked segments in __iommu_map_sg() dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support dma-direct: support PCI P2PDMA pages in dma-direct map_sg dma-mapping: allow EREMOTEIO return code for P2PDMA transfers PCI/P2PDMA: Introduce helpers for dma_map_sg implementations PCI/P2PDMA: Attempt to set map_type if it has not been set lib/scatterlist: add flag for indicating P2PDMA segments in an SGL swiotlb: clean up some coding style and minor issues dma-mapping: update comment after dmabounce removal scsi: sd: Add a comment about limiting max_sectors to shost optimal limit ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit ...