summaryrefslogtreecommitdiff
path: root/drivers/nvme
AgeCommit message (Collapse)AuthorFilesLines
2022-08-05Merge tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-blockLinus Torvalds31-354/+3609
Pull block driver updates from Jens Axboe: - NVMe pull requests via Christoph: - add support for In-Band authentication (Hannes Reinecke) - handle the persistent internal error AER (Michael Kelley) - use in-capsule data for TCP I/O queue connect (Caleb Sander) - remove timeout for getting RDMA-CM established event (Israel Rukshin) - misc cleanups (Joel Granados, Sagi Grimberg, Chaitanya Kulkarni, Guixin Liu, Xiang wangx) - use command_id instead of req->tag in trace_nvme_complete_rq() (Bean Huo) - various fixes for the new authentication code (Lukas Bulwahn, Dan Carpenter, Colin Ian King, Chaitanya Kulkarni, Hannes Reinecke) - small cleanups (Liu Song, Christoph Hellwig) - restore compat_ioctl support (Nick Bowler) - make a nvmet-tcp workqueue lockdep-safe (Sagi Grimberg) - enable generic interface (/dev/ngXnY) for unknown command sets (Joel Granados, Christoph Hellwig) - don't always build constants.o (Christoph Hellwig) - print the command name of aborted commands (Christoph Hellwig) - MD pull requests via Song: - Improve raid5 lock contention, by Logan Gunthorpe. - Misc fixes to raid5, by Logan Gunthorpe. - Fix race condition with md_reap_sync_thread(), by Guoqing Jiang. - Fix potential deadlock with raid5_quiesce and raid5_get_active_stripe, by Logan Gunthorpe. - Refactoring md_alloc(), by Christoph" - Fix md disk_name lifetime problems, by Christoph Hellwig - Convert prepare_to_wait() to wait_woken() api, by Logan Gunthorpe; - Fix sectors_to_do bitmap issue, by Logan Gunthorpe. - Work on unifying the null_blk module parameters and configfs API (Vincent) - drbd bitmap IO error fix (Lars) - Set of rnbd fixes (Guoqing, Md Haris) - Remove experimental marker on bcache async device registration (Coly) - Series from cleaning up the bio splitting (Christoph) - Removal of the sx8 block driver. This hardware never really widespread, and it didn't receive a lot of attention after the initial merge of it back in 2005 (Christoph) - A few fixes for s390 dasd (Eric, Jiang) - Followup set of fixes for ublk (Ming) - Support for UBLK_IO_NEED_GET_DATA for ublk (ZiyangZhang) - Fixes for the dio dma alignment (Keith) - Misc fixes and cleanups (Ming, Yu, Dan, Christophe * tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block: (136 commits) s390/dasd: Establish DMA alignment s390/dasd: drop unexpected word 'for' in comments ublk_drv: add support for UBLK_IO_NEED_GET_DATA ublk_cmd.h: add one new ublk command: UBLK_IO_NEED_GET_DATA ublk_drv: cleanup ublksrv_ctrl_dev_info ublk_drv: add SET_PARAMS/GET_PARAMS control command ublk_drv: fix ublk device leak in case that add_disk fails ublk_drv: cancel device even though disk isn't up block: fix leaking page ref on truncated direct io block: ensure bio_iov_add_page can't fail block: ensure iov_iter advances for added pages drivers:md:fix a potential use-after-free bug md/raid5: Ensure batch_last is released before sleeping for quiesce md/raid5: Move stripe_request_ctx up md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage() md/raid5: Make is_inactive_blocked() helper md/raid5: Refactor raid5_get_active_stripe() block: pass struct queue_limits to the bio splitting helpers block: move bio_allowed_max_sectors to blk-merge.c block: move the call to get_max_io_size out of blk_bio_segment_split ...
2022-08-03Merge tag 'pull-work.iov_iter-base' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter updates from Al Viro: "Part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(). new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's the explanation for the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work..." * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: first_iovec_segment(): just return address iov_iter: massage calling conventions for first_{iovec,bvec}_segment() iov_iter: first_{iovec,bvec}_segment() - simplify a bit iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT iov_iter_bvec_advance(): don't bother with bvec_iter copy_page_{to,from}_iter(): switch iovec variants to generic keep iocb_flags() result cached in struct file iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC struct file: use anonymous union member for rcuhead and llist btrfs: use IOMAP_DIO_NOSYNC teach iomap_dio_rw() to suppress dsync No need of likely/unlikely on calls of check_copy_size()
2022-08-03block: change the blk_queue_split calling conventionChristoph Hellwig1-1/+1
The double indirect bio leads to somewhat suboptimal code generation. Instead return the (original or split) bio, and make sure the request_queue arguments to the lower level helpers is passed after the bio to avoid constant reshuffling of the argument passing registers. Also give it and the helpers used to implement it more descriptive names. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-tcp: fix lockdep complaint on nvmet_tcp_wq flush during queue teardownSagi Grimberg1-1/+2
We probably need nvmet_tcp_wq to have MEM_RECLAIM as we are sending/receiving for the socket from works on this workqueue. Also this eliminates lockdep complaints: -- [ 6174.010200] workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_tcp_release_queue_work [nvmet_tcp] is flushing !WQ_MEM_RECLAIM nvmet_tcp_wq:nvmet_tcp_io_work [nvmet_tcp] [ 6174.010216] WARNING: CPU: 20 PID: 14456 at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: enable generic interface (/dev/ngXnY) for unknown command setsJoel Granados1-7/+34
Extend nvme_alloc_ns() and nvme_validate_ns() for unknown command-set as well. Both are made to use a new helper (nvme_update_ns_info_cs_indep) which is similar to nvme_update_ns_info but performs fewer operations to get the generic interface up. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> [hch: rebased on other refactoring patches] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: factor out a nvme_ns_is_readonly helperChristoph Hellwig1-5/+10
Add a little helper to check if a namespace should be marked read-only that uses a new is_readonly flag in the nvme_ns_info structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: refactor namespace probingChristoph Hellwig1-105/+125
Change nvme_ns_scan to gather all information needed for generic namespace setup into a nvme_ns_info structure. This structure is filled from the Command Set Idependent Identify Namespace data structure if it is available or else the legacy Identify namespace structure. With that everything related to the NVM command set (and the ZNS command set derived from it) can be encapsulated in the nvme_update_ns_info_block function while keeping the rest of the namespace probing generic. The downside is that we now always issue two Identify Namespace calls for each probed namespace instead of usually just a single one previously. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: generalize the nvme_multi_css check in nvme_scan_nsChristoph Hellwig1-6/+6
Check for multiple command set support early on an error out if is not supported when a !NVM command set namespace is found. This prepares for adding command set independent passthrough support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: rename nvme_validate_or_alloc_ns to nvme_scan_nsChristoph Hellwig1-3/+3
This shorter name much better fits what this function does in the scanning process. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: catch -ENODEV from nvme_revalidate_zones againChristoph Hellwig1-6/+7
nvme_revalidate_zones can also return -ENODEV if e.g. zone sizes aren't constant or not a power of two. In that case we should jump to marking the gendisk hidden and only support pass through. Fixes: 602e57c9799c ("nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info") Reported-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-auth: select the intended CRYPTO_DH_RFC7919_GROUPSLukas Bulwahn1-1/+1
Commit 71ebe3842ebe ("nvmet-auth: Diffie-Hellman key exchange support") intends to select 'Support for RFC 7919 FFDHE group parameters' for using FFDHE groups for NVMe In-Band Authentication. It however selects CRYPTO_DH_GROUPS_RFC7919, instead of the intended CRYPTO_DH_RFC7919_GROUPS; notice the swapping of words here. Correct the select to the intended config option. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-auth: fix return value check in auth receiveChaitanya Kulkarni1-2/+1
nvmet_auth_challenge() return type is int and currently it uses status variable that is of type u16 in nvmet_execute_auth_receive(). Catch the return value of nvmet_auth_challenge() into int and set the NVME_SC_INTERNAL as status variable before we jump to error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-auth: fix return value check in auth sendChaitanya Kulkarni1-2/+2
nvmet_setup_auth() return type is int and currently it uses status variable that is of type u16 in nvmet_execute_auth_send(). Catch the return value of nvmet_setup_auth() into int and set the NVME_SC_INTERNAL as status variable before we jump to error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-auth: fix a couple of spelling mistakesColin Ian King2-2/+2
There are a couple of spelling mistakes in pr_warn and pr_debug messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet: fix a format specifier in nvmet_auth_ctrl_exponentialChristoph Hellwig1-1/+1
dh_keysize is a size_t, use the proper format specifier for printing it. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@sues.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet: don't check for NULL pointer before kfree in nvmet_host_releaseChristoph Hellwig1-2/+2
And add an empty line after the variable declaration. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-apple: stop casting function pointer signaturesChristoph Hellwig1-6/+15
Casting function pointers breaks control flow enforcement and is generally a horrible coding style. Add two wrappers to get rid of these casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-tcp: split nvme_tcp_alloc_tagsetChristoph Hellwig1-41/+41
Split nvme_tcp_alloc_tagset into one helper for the admin tag_set and one for the I/O tag set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-rdma: split nvme_rdma_alloc_tagsetChristoph Hellwig1-46/+46
Split nvme_rdma_alloc_tagset into one helper for the admin tag_set and one for the I/O tag set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-pci: split nvme_dev_addChristoph Hellwig1-35/+37
Split nvme_dev_add into a helper to actually allocate the tag set, and one that just update the number of queues. Add a local variable for the tag_set to clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-pci: split nvme_alloc_admin_tagsChristoph Hellwig1-29/+31
Split nvme_alloc_admin_tags into a helper to actually allocate the tag set, and one that just restarts the admin queue. Add a local variable for the tag_set to clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-pci: print the command name of aborted commandsChristoph Hellwig2-2/+5
To allow for slightly better debugging, print the command name when aborting an command. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-pci: remove useless assignment in nvme_pci_setup_prpsLiu Song1-1/+0
If prp_list is NULL, nvme_unmap_sg will be performed, and the assignment to first_dma is meaningless, so remove it. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-auth: uninitialized variable in nvme_auth_transform_key()Dan Carpenter1-9/+16
A couple of the early error gotos call kfree_sensitive(transformed_key); before "transformed_key" has been initialized. Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-auth: fix off by one checksDan Carpenter1-5/+5
The > ARRAY_SIZE() checks need to be >= ARRAY_SIZE() to prevent reading one element beyond the end of the arrays. Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: define compat_ioctl again to unbreak 32-bit userspace.Nick Bowler2-0/+2
Commit 89b3d6e60550 ("nvme: simplify the compat ioctl handling") removed the initialization of compat_ioctl from the nvme block_device_operations structures. Presumably the expectation was that 32-bit ioctls would be directed through the regular handler but this is not the case: failing to assign .compat_ioctl actually means that the compat case is disabled entirely, and any attempt to submit nvme ioctls from 32-bit userspace fails outright with -ENOTTY. For example: % smartctl -x /dev/nvme0n1 [...] Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device The blkdev_compat_ptr_ioctl helper can be used to direct compat calls through the main ioctl handler and makes things work again. Fixes: 89b3d6e60550 ("nvme: simplify the compat ioctl handling") Signed-off-by: Nick Bowler <nbowler@draconx.ca> Reviewed-by: Guixin Liu <kanie@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: don't always build constants.oChristoph Hellwig2-3/+2
The entire content of constants.c if guarded by an ifdef, so switch to just building the file conditionally instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: use command_id instead of req->tag in trace_nvme_complete_rq()Bean Huo1-1/+1
Use command_id instead of req->tag in trace_nvme_complete_rq(), because of commit e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release"), cmd->common.command_id is set to ((genctl & 0xf)< 12 | req->tag), no longer req->tag, which makes cid in trace_nvme_complete_rq and trace_nvme_setup_cmd are not the same. Fixes: e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release") Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-multipath: refactor nvme_mpath_add_diskJoel Granados3-7/+6
Pass anagrpid as second argument. This is prep patch that allows reusing this function for supporting unknown command sets. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-apple: use nvme core helper to cancel requests in tagsetGuixin Liu1-5/+2
Use nvme core helper nvme_cancel_tagset and nvme_cancel_admin_tagset instead of same logic code. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-pci: use nvme core helper to cancel requests in tagsetGuixin Liu1-4/+2
Use nvme core helper nvme_cancel_tagset and nvme_cancel_admin_tagset instead of same logic code. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ruozhu Li <liruozhu@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-tcp: use in-capsule data for I/O connectCaleb Sander1-5/+7
Currently, command data is only sent in-capsule on the for admin or I/O commands on queues that indicate support for it. Send fabrics command data in-capsule for I/O queues as well to avoid needing a separate H2CData PDU for the connect command. This is optimization. Without this change, we send the connect command capsule and data in separate PDUs (CapsuleCmd and H2CData), and must wait for the controller to respond with an R2T PDU before sending the H2CData. With the change, we send a single CapsuleCmd PDU that includes the data. This reduces the number of bytes (and likely packets) sent across the network, and simplifies the send state machine handling in the driver. Signed-off-by: Caleb Sander <csander@purestorage.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-rdma: remove timeout for getting RDMA-CM established eventIsrael Rukshin1-8/+5
In case many controllers start error recovery at the same time (i.e., when port is down and up), they may never succeed to reconnect again. This is because the target can't handle all the connect requests at three seconds (the arbitrary value set today). Even if some of the connections are established, when a single queue fails to connect, all the controller's queues are destroyed as well. So, on the following reconnection attempts the number of connect requests may remain the same. To fix this, remove the timeout and wait for RDMA-CM event to abort/complete the connect request. RDMA-CM sends unreachable event when a timeout of ~90 seconds is expired. This approach is used at other RDMA-CM users like SRP and iSER at blocking mode. The commit also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-auth: expire authentication sessionsHannes Reinecke3-1/+21
Each authentication step is required to be completed within the KATO interval (or two minutes if not set). So add a workqueue function to reset the transaction ID and the expected next protocol step; this will automatically the next authentication command referring to the terminated authentication. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet-auth: Diffie-Hellman key exchange supportHannes Reinecke5-8/+232
Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. This patch adds a new host configfs attribute 'dhchap_dhgroup' to select the FFDHE group to use. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet: implement basic In-Band AuthenticationHannes Reinecke9-3/+1100
Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds three additional configfs entries 'dhchap_key', 'dhchap_ctrl_key', and 'dhchap_hash' to the 'host' configfs directory. The 'dhchap_key' and 'dhchap_ctrl_key' entries need to be in the ASCII format as specified in NVMe Base Specification v2.0 section 8.13.5.8 'Secret representation'. 'dhchap_hash' defaults to 'hmac(sha256)', and can be written to to switch to a different HMAC algorithm. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvmet: parse fabrics commands on io queuesHannes Reinecke4-3/+23
Some fabrics commands can be sent via io queues, so add a new function nvmet_parse_fabrics_io_cmd() and rename the existing nvmet_parse_fabrics_cmd() to nvmet_parse_fabrics_admin_cmd(). Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-auth: Diffie-Hellman key exchange supportHannes Reinecke3-6/+350
Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: implement In-Band authenticationHannes Reinecke15-7/+1465
Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds two new fabric options 'dhchap_secret' to specify the pre-shared key (in ASCII respresentation according to NVMe 2.0 section 8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify the pre-shared controller key for bi-directional authentication of both the host and the controller. Re-authentication can be triggered by writing the PSK into the new controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> [axboe: fold in clang build fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-fabrics: decode 'authentication required' connect errorHannes Reinecke1-0/+4
The 'connect' command might fail with NVME_SC_AUTH_REQUIRED, so we should be decoding this error, too. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme-loop: use nvme core helpers to cancel all requests in a tagsetSagi Grimberg1-6/+2
A helper now exist, no need to open-code the same functionality. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: fix qid param blk_mq_alloc_request_hctxChaitanya Kulkarni1-1/+1
Only caller of the __nvme_submit_sync_cmd() with qid value not equal to NVME_QID_ANY is nvmf_connect_io_queues(), where qid value is alway set to > 0. [1] __nvme_submit_sync_cmd() callers with qid parameter from :- Caller | qid parameter ------------------------------------------------------ * nvme_fc_connect_io_queues() | nvmf_connect_io_queue() | qid > 0 * nvme_rdma_start_io_queues() | nvme_rdma_start_queue() | nvmf_connect_io_queues() | qid > 0 * nvme_tcp_start_io_queues() | nvme_tcp_start_queue() | nvmf_connect_io_queues() | qid > 0 * nvme_loop_connect_io_queues() | nvmf_connect_io_queues() | qid > 0 When qid value of the function parameter __nvme_submit_sync_cmd() is > 0 from above callers, we use blk_mq_alloc_request_hctx(), where we pass last parameter as 0 if qid functional parameter value is set to 0 with conditional operators, see 1002 :- 991 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, 992 union nvme_result *result, void *buffer, unsigned bufflen, 993 int qid, int at_head, blk_mq_req_flags_t flags) 994 { 995 struct request *req; 996 int ret; 997 998 if (qid == NVME_QID_ANY) 999 req = blk_mq_alloc_request(q, nvme_req_op(cmd), flags); 1000 else 1001 req = blk_mq_alloc_request_hctx(q, nvme_req_op(cmd), flags, 1002 qid ? qid - 1 : 0); 1003 But qid function parameter value of the __nvme_submit_sync_cmd() will never be 0 from above caller list see [1], and all the other callers of __nvme_submit_sync_cmd() use NVME_QID_ANY as qid value :- 1. nvme_submit_sync_cmd() 2. nvme_features() 3. nvme_sec_submit() 4. nvmf_reg_read32() 5. nvmf_reg_read64() 6. nvmf_ref_write32() 7. nvmf_connect_admin_queue() Remove the conditional operator to pass the qid as 0 in the call to blk_mq_alloc_requst_hctx(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: remove unused timeout parameterChaitanya Kulkarni3-14/+10
The function __nvme_submit_sync_cmd() has following list of callers that sets the timeout value to 0 :- Callers | Timeout value ------------------------------------------------ nvme_submit_sync_cmd() | 0 nvme_features() | 0 nvme_sec_submit() | 0 nvmf_reg_read32() | 0 nvmf_reg_read64() | 0 nvmf_reg_write32() | 0 nvmf_connect_admin_queue() | 0 nvmf_connect_io_queue() | 0 Remove the timeout function parameter from __nvme_submit_sync_cmd() and adjust the rest of code accordingly. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: handle the persistent internal error AERMichael Kelley1-2/+29
In the NVM Express Revision 1.4 spec, Figure 145 describes possible values for an AER with event type "Error" (value 000b). For a Persistent Internal Error (value 03h), the host should perform a controller reset. Add support for this error using code that already exists for doing a controller reset. As part of this support, introduce two utility functions for parsing the AER type and subtype. This new support was tested in a lab environment where we can generate the persistent internal error on demand, and observe both the Linux side and NVMe controller side to see that the controller reset has been done. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03nvme: remove a double word in a commentXiang wangx1-1/+1
Delete the redundant word 'be'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-blockLinus Torvalds13-69/+64
Pull block updates from Jens Axboe: - Improve the type checking of request flags (Bart) - Ensure queue mapping for a single queues always picks the right queue (Bart) - Sanitize the io priority handling (Jan) - rq-qos race fix (Jinke) - Reserved tags handling improvements (John) - Separate memory alignment from file/disk offset aligment for O_DIRECT (Keith) - Add new ublk driver, userspace block driver using io_uring for communication with the userspace backend (Ming) - Use try_cmpxchg() to cleanup the code in various spots (Uros) - Finally remove bdevname() (Christoph) - Clean up the zoned device handling (Christoph) - Clean up independent access range support (Christoph) - Clean up and improve block sysfs handling (Christoph) - Clean up and improve teardown of block devices. This turns the usual two step process into something that is simpler to implement and handle in block drivers (Christoph) - Clean up chunk size handling (Christoph) - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu, Ming, Sebastian, Yang, Ying) * tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits) ublk_drv: fix double shift bug ublk_drv: make sure that correct flags(features) returned to userspace ublk_drv: fix error handling of ublk_add_dev ublk_drv: fix lockdep warning block: remove __blk_get_queue block: call blk_mq_exit_queue from disk_release for never added disks blk-mq: fix error handling in __blk_mq_alloc_disk ublk: defer disk allocation ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask ublk: fold __ublk_create_dev into ublk_ctrl_add_dev ublk: cleanup ublk_ctrl_uring_cmd ublk: simplify ublk_ch_open and ublk_ch_release ublk: remove the empty open and release block device operations ublk: remove UBLK_IO_F_PREFLUSH ublk: add a MAINTAINERS entry block: don't allow the same type rq_qos add more than once mmc: fix disk/queue leak in case of adding disk failure ublk_drv: fix an IS_ERR() vs NULL check ublk: remove UBLK_IO_F_INTEGRITY ublk_drv: remove unneeded semicolon ...
2022-07-26RDMA/core: introduce ib_dma_pci_p2p_dma_supported()Logan Gunthorpe1-1/+1
Introduce the helper function ib_dma_pci_p2p_dma_supported() to check if a given ib_device can be used in P2PDMA transfers. This ensures the ib_device is not using virt_dma and also that the underlying dma_device supports P2PDMA. Use the new helper in nvme-rdma to replace the existing check for ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows switching away from pci_p2pdma_[un]map_sg(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26nvme-pci: convert to using dma_map_sgtable()Logan Gunthorpe1-40/+29
The dma_map operations now support P2PDMA pages directly. So remove the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls to dma_map_sgtable(). dma_map_sgtable() returns more complete error codes than dma_map_sg() and allows differentiating EREMOTEIO errors in case an unsupported P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET so the request isn't retried. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26nvme-pci: check DMA ops when indicating support for PCI P2PDMALogan Gunthorpe3-5/+12
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops flags can be checked for PCI P2PDMA support. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-25nvme-pci: Crucial P2 has bogus namespace idsTobias Gruetzmacher1-0/+2
This adds a quirk for the Crucial P2. Signed-off-by: Tobias Gruetzmacher <tobias-git@23.gs> Signed-off-by: Christoph Hellwig <hch@lst.de>