summaryrefslogtreecommitdiff
path: root/drivers/nvme
AgeCommit message (Collapse)AuthorFilesLines
2016-07-27Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds22-86/+9081
Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...
2016-07-27Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds2-4/+4
Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-07-21nvme/pci: Provide SR-IOV supportKeith Busch1-0/+19
This registers an sr-iov callback for nvme. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-21nvme: initialize variable before logical OR'ing itJay Freyensee1-0/+1
It is typically not good coding or secure coding practice to logical OR a variable without an initialization value first. Here on this line: integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE; BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable never set to an initial value. This patch fixes that. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-21block: ensure bios return from blk_get_request are properly initializedChristoph Hellwig1-4/+0
blk_get_request is used for BLOCK_PC and similar passthrough requests. Currently we always need to call blk_rq_set_block_pc or an open coded version of it to allow appending bios using the request mapping helpers later on, which is a somewhat awkward API. Instead move the initialization part of blk_rq_set_block_pc into blk_get_request, so that we always have a safe to use request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-21block: get rid of bio_rw and READAChristoph Hellwig1-1/+1
These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-14NVMe: don't allocate unused nvme_majorNeilBrown1-15/+1
When alloc_disk(0) is used, the ->major number is ignored. All device numbers are allocated with a major of BLOCK_EXT_MAJOR. So remove all references to nvme_major. [akpm@linux-foundation.org: one unregister_blkdev() was missed] Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@noble Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-14nvme: Remove RCU namespace protectionKeith Busch1-35/+39
We can't sleep with RCU read lock held, but we need to do potentially blocking stuff to namespace queues when iterating the list. This patch removes the RCU locking and holds a mutex instead. To prevent deadlocks, this patch removes holding the mutex during namespace scanning and removal. The unlocked namespace scanning is made safe by holding a reference to the namespace being scanned. List iteration that does IO has to be unlocked to allow error recovery. The caller must ensure the list can not be manipulated during such an event, so this patch adds a comment explaining this requirement to the only function that iterates an unlocked list. All callers currently meet this requirement, so no further changes required. List iterations that do not do IO can safely use the lock since it couldn't block recovery from missing forced IO completions. Reported-by: Ming Lin <mlin at kernel.org> [fixes 0bf77e9 nvme: switch to RCU freeing the namespace] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-13nvme: avoid crashes when node 0 is memoryless node.Masayoshi Mizuma1-1/+1
When CONFIG_NUMA is enabled and node 0 is memoryless, the system crashes because nvme_probe() sets the device->numa_node to 0 by set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0. To avoid the crash, we should change the 0 to first_memory_node. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-13nvme: Limit command retriesKeith Busch3-1/+15
Many controller implementations will return errors to commands that will not succeed, but without the DNR bit set. The driver previously retried these commands an unlimited number of times until the command timeout has exceeded, which takes an unnecessarilly long period of time. This patch limits the number of retries a command can have, defaulting to 5, but is user tunable at load or runtime. The struct request's 'retries' field is used to track the number of retries attempted. This is in contrast with scsi's use of this field, which indicates how many retries are allowed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme-loop: fix nvme-loop Kconfig dependenciesArnd Bergmann1-3/+2
I ran into the same problem on NVME_TARGET_RDMA now, which otherwise needs dependencies on both CONFIG_BLOCK and CONFIGFS_FS: warning: (NVME_TARGET_LOOP && NVME_TARGET_RDMA) selects NVME_TARGET which has unmet direct dependencies (BLOCK && CONFIGFS_FS) 0xA002B368 Mon Jul 11 18:00:45 CEST 2016 failed In file included from ../drivers/nvme/target/core.c:16:0: drivers/nvme/target/nvmet.h:222:14: error: field 'inline_bio' has incomplete type struct bio inline_bio; ^~~~~~~~~~ drivers/nvme/target/core.c: In function 'nvmet_async_event_work': drivers/nvme/target/core.c:98:3: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration] kfree(aen); ^~~~~ ../drivers/nvme/target/core.c: In function 'nvmet_ns_enable': ../drivers/nvme/target/core.c:269:13: error: implicit declaration of function 'blkdev_get_by_path' [-Werror=implicit-function-declaration] ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ | FMODE_WRITE, Folding in my patch below should address that too. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvmet: fix return value check in nvmet_subsys_alloc()Wei Yongjun1-1/+1
In case of error, the function kstrndup() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme-fabrics: add-remove ctrl repeat fixMing Lin1-0/+4
Repeatedly adding then removing the same NVMe-over-Fabrics controller over and over again (shown below) can cause a kernel crash (also shown below). This patch fixes that. [nvmf]# ./setup_nvme_connections.sh traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside -nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics [nvmf]# ./remove_nvme_connections.sh 2 echo 1 > /sys/class/nvme/nvme0/delete_controller echo 1 > /sys/class/nvme/nvme1/delete_controller [nvmf]# ./setup_nvme_connections.sh traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics Killed [nvmf]# dmesg [ 313.416908] nvme nvme0: creating 16 I/O queues. [ 313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr 192.168.1.100:4420 [ 313.524857] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.525490] PGD 0 [ 313.525726] Oops: 0000 [#1] SMP [ 313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en mlx4_ib ib_core mlx4_core [ 313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted 4.7.0-rc2+ #2 [ 313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT -F/IBQF/IBFF, BIOS 1.0a 10/09/2012 [ 313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti: ffff88025b980000 [ 313.527879] RIP: 0010:[<ffffffff8136c60e>] [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.528232] RSP: 0018:ffff88025b983db0 EFLAGS: 00010206 [ 313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX: fffffffffffffff1 [ 313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI: 0000000000000011 [ 313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09: ffff880471879058 [ 313.528956] R10: 000000000000002c R11: ffff88047f415000 R12: ffff880471879800 [ 313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15: fffffffffffffff8 [ 313.529303] FS: 00007f778f510700(0000) GS:ffff88047fbc0000(0000) knlGS:0000000000000000 [ 313.529629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4: 00000000000406e0 [ 313.529989] Stack: [ 313.530154] ffff88025b983e48 ffffffffa0171c74 0000000000000001 0000000000000059 [ 313.530621] ffff880476f32400 ffff88047e8add80 0000010074b33aa0 ffff880471879059 [ 313.531162] ffff88047187904b ffff880471879058 0000000000000000 ffff88047736e000 [ 313.531629] Call Trace: [ 313.531797] [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840 [nvme_fabrics] [ 313.531974] [<ffffffff81180b53>] __vfs_write+0x23/0x120 [ 313.532146] [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0 [ 313.532316] [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170 [ 313.532487] [<ffffffff811811f3>] vfs_write+0xb3/0x1b0 [ 313.532658] [<ffffffff8117e321>] ? filp_close+0x51/0x70 [ 313.532845] [<ffffffff811824e1>] SyS_write+0x41/0xa0 [ 313.533016] [<ffffffff8183055b>] entry_SYSCALL_64_fastpath+0x13/0x8f [ 313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83 c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31 [ 313.536563] RIP [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.536815] RSP <ffff88025b983db0> [ 313.536981] CR2: 0000000000000010 [ 313.537151] ---[ end trace 3d952e590e7bc2d5 ]--- Reported-and-tested-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <mlin@kernel.org> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme-fabrics: Remove tl_retry_countSagi Grimberg2-17/+0
The timeout before error recovery logic kicks in is dictated by the nvme keep-alive, so we don't really need a transport layer retry count. transports can retry for as much as they like. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme-rdma: Don't use tl_retry_countSagi Grimberg1-6/+3
Always use the maximum qp retry count as the error recovery timeout is dictated from the nvme keep-alive. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme-rdma: fix the return value of nvme_rdma_reinit_request()Wei Yongjun1-1/+1
PTR_ERR should be applied before its argument is reassigned, otherwise the return value will be set to 0, not error code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme/quirk: Add a delay before checking for adapter readinessGuilherme G. Piccoli3-0/+24
When disabling the controller, the specification says the register NVME_REG_CC should be written and then driver needs to wait the adapter to be ready, which is checked by reading another register bit (NVME_CSTS_RDY). There's a timeout validation in this checking, so in case this timeout is reached the driver gives up and removes the adapter from the system. After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003) (HGST adapter) end up being removed if we issue a reset_controller, because driver keeps verifying the NVME_REG_CSTS until the timeout is reached. This patch adds a necessary quirk for this adapter, by introducing a delay before nvme_wait_ready(), so the reset procedure is able to be completed. This quirk is needed because just increasing the timeout is not enough in case of this adapter - the driver must wait before start reading NVME_REG_CSTS register on this specific device. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-09Merge branch 'for-4.8/block' of ↵Jens Axboe1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm into for-4.8/drivers Dan writes: "The removal of ->driverfs_dev in favor of just passing the parent device in as a parameter to add_disk(). See below, it has received a "Reviewed-by" from Christoph, Bart, and Johannes. It is also a pre-requisite for Fam Zheng's work to cleanup gendisk uevents vs attribute visibility [1]. We would extend device_add_disk() to take an attribute_group list. This is based off a branch of block.git/for-4.8/drivers and has received a positive build success notification from the kbuild robot across several configs. [1]: "gendisk: Generate uevent after attribute available" http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"
2016-07-08nvme-rdma: add a NVMe over Fabrics RDMA host driverChristoph Hellwig3-0/+2040
This patch implements the RDMA host (initiator in SCSI speak) driver. It can be used to connect to remote NVMe over Fabrics controllers over Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well a the new fabrics library. To connect to all NVMe over Fabrics controller reachable on a given taget port using RDMA/CM use the following command: nvme connect-all -t rdma -a $IPADDR This requires the latest version of nvme-cli with Fabrics support. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-08nvmet-rdma: add a NVMe over Fabrics RDMA target driverChristoph Hellwig3-0/+1460
This patch implements the RDMA transport for the NVMe over Fabrics target, which allows exporting NVMe over Fabrics functionality over RDMA fabrics (Infiniband, RoCE, iWARP). All NVMe logic is in the generic target and this module just provides a small glue between it and the generic code in the RDMA subsystem. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>, Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-08nvme: add new reconnecting controller stateChristoph Hellwig2-0/+13
The nvme fabric (RDMA, FC, etc...) can introduce port, link or node failures that may require a reconnect to re-establish the connection. Add a new reconnecting state that will initially be used by the RDMA driver. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-07nvme: lightnvm: make MLC num_pairs little endianJohannes Thumshirn1-1/+1
According to the OpenChannel SSD interface specification the NAND flash MLC page pairing information's number of page page pairings field is the first two bytes in the MLC Page Pairing data structure. The hardware's data structure itself is little endian so annotate it as such, like the rest of lighnvm's data structures. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-07nvmet: fix an error codeDan Carpenter1-2/+2
We accidentally return zero here when ERR_PTR(-ENOMEM) is intended. Fixes: a07b4970f464 ('nvmet: add a generic NVMe target') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-07nvme-loop: add configfs dependencyArnd Bergmann1-0/+1
CONFIG_NVME_TARGET has a correct CONFIG_CONFIGFS_FS dependency, but the newly added NVME_TARGET_LOOP is missing this, resulting in a link failure: drivers/nvme/built-in.o: In function `nvmet_init_configfs': loop.c:(.init.text+0x2a0): undefined reference to `config_group_init' loop.c:(.init.text+0x2c0): undefined reference to `config_group_init_type_name' loop.c:(.init.text+0x318): undefined reference to `configfs_register_subsystem' drivers/nvme/built-in.o: In function `nvmet_exit_configfs': loop.c:(.exit.text+0x9c): undefined reference to `configfs_unregister_subsystem' This adds the same dependency here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme-loop: add a NVMe loopback host driverChristoph Hellwig3-0/+766
This patch implements adds nvme-loop which allows to access local devices exported as NVMe over Fabrics namespaces. This module can be useful for easy evaluation, testing and also feature experimentation. To createa nvme-loop device you need to configure the NVMe target to export a loop port (see the nvmetcli documentaton for that) and then connect to it using nvme connect-all -t loop which requires the very latest nvme-cli version with Fabrics support. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvmet: add a generic NVMe targetChristoph Hellwig11-0/+3376
This patch introduces a implementation of NVMe subsystems, controllers and discovery service which allows to export NVMe namespaces across fabrics such as Ethernet, FC etc. The implementation conforms to the NVMe 1.2.1 specification and interoperates with NVMe over fabrics host implementations. Configuration works using configfs, and is best performed using the nvmetcli tool from http://git.infradead.org/users/hch/nvmetcli.git, which also has a detailed explanation of the required steps in the README file. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Signed-off-by: Anthony Knapp <anthony.j.knapp@intel.com> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: add keep-alive supportSagi Grimberg4-1/+119
Periodic keep-alive is a mandatory feature in NVMe over Fabrics, and optional in NVMe 1.2.1 for PCIe. This patch adds periodic keep-alive sent from the host to verify that the controller is still responsive and vice-versa. The keep-alive timeout is user-defined (with keep_alive_tmo connection parameter) and defaults to 5 seconds. In order to avoid a race condition where the host sends a keep-alive competing with the target side keep-alive timeout expiration, the host adds a grace period of 10 seconds when publishing the keep-alive timeout to the target. In case a keep-alive failed (or timed out), a transport specific error recovery kicks in. For now only NVMe over Fabrics is wired up to support keep alive, but we can add PCIe support easily once controllers actually supporting it become available. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@chelsio.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme-fabrics: add a generic NVMe over Fabrics libraryChristoph Hellwig6-1/+1098
The NVMe over Fabrics library provides an interface for both transports and the nvme core to handle fabrics specific commands and attributes independent of the underlying transport. In addition, the fabrics library adds a misc device interface that allow actually creating a fabrics controller, as we can't just autodiscover it like in the PCI case. The nvme-cli utility has been enhanced to use this interface to support fabric connect and discovery. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>, Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>, Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme.h: add NVMe over Fabrics definitionsChristoph Hellwig2-4/+4
The NVMe over Fabrics specification defines a protocol interface and related extensions to NVMe that enable operation over network protocols. The NVMe over Fabrics specification has an NVMe Transport binding for each NVMe Transport. This patch adds the fabrics related definitions: - fabric specific command set and error codes - transport addressing and binding definitions - fabrics sgl extensions - controller identification fabrics enhancements - discovery log page definition Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: add fabrics sysfs attributesMing Lin3-3/+78
- delete_controller: This attribute allows to delete a controller. A driver is not obligated to support it (pci doesn't) so it is created only if the driver supports it. The new fabrics drivers will support it (essentialy a disconnect operation). Usage: echo > /sys/class/nvme/nvme0/delete_controller - subsysnqn: This attribute shows the subsystem nqn of the configured device. If a driver does not implement the get_subsysnqn method, the file will not appear in sysfs. - transport: This attribute shows the transport name. Added a "name" field to struct nvme_ctrl_ops. For loop, cat /sys/class/nvme/nvme0/transport loop For RDMA, cat /sys/class/nvme/nvme0/transport rdma For PCIe, cat /sys/class/nvme/nvme0/transport pcie - address: This attributes shows the controller address. The fabrics drivers that will implement get_address can show the address of the connected controller. example: cat /sys/class/nvme/nvme0/address traddr=192.168.2.2,trsvcid=1023 Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: Modify and export sync command submission for fabricsChristoph Hellwig3-13/+23
NVMe over fabrics will use __nvme_submit_sync_cmd in the the transport and require a few tweaks to it. For that we export it and add a few more paramters: 1. allow passing a queue ID to the block layer For the NVMe over Fabrics connect command we need to able to specify a queue ID that we want to send the command on. Add a qid parameter to the relevant functions to enable this behavior. 2. allow submitting at_head commands In cases where we want to (re)connect to a controller where we have inflight queued commands we want to first connect and only then allow the other queued commands to be kicked. This will prevents failures in controller resets and reconnects. 3. allow passing flags to blk_mq_allocate_request Both for Fabrics connect the the keep-alive feature in NVMe 1.2.1 we want to be able to use reserved requests. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: allow transitioning from NEW to LIVE stateChristoph Hellwig1-0/+1
For Fabrics we're not going through an intermediate reset state (at least for now). Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-27block: convert to device_add_disk()Dan Williams1-2/+1
For block drivers that specify a parent device, convert them to use device_add_disk(). This conversion was done with the following semantic patch: @@ struct gendisk *disk; expression E; @@ - disk->driverfs_dev = E; ... - add_disk(disk); + device_add_disk(E, disk); @@ struct gendisk *disk; expression E1, E2; @@ - disk->driverfs_dev = E1; ... E2 = disk; ... - add_disk(E2); + device_add_disk(E1, E2); ...plus some manual fixups for a few missed conversions. Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-12nvme: move the workaround for I/O queue-less controllers from PCIe to coreChristoph Hellwig2-12/+15
We want to apply this to Fabrics drivers as well, so move it to common code. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme: factor out a add nvme_is_write helperChristoph Hellwig1-3/+2
Centralize the check if a given NVMe command reads or writes data. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme: allow for size limitations from transport driversChristoph Hellwig1-2/+5
Some transport drivers may have a lower transfer size than the controller. So allow the transport to set it in the controller max_hw_sectors. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-09NVMe: Only release requested regionsJohannes Thumshirn1-2/+7
The NVMe driver only requests the PCIe device's memory regions but releases all possible regions (including eventual I/O regions). This leads to a stale warning entry in dmesg about freeing non existent resources. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-08NVMe: Fix removal in case of active namespace list scanning methodSunad Bhandary1-6/+15
In case of the active namespace list scanning method, a namespace that is detached is not removed from the host if it was the last entry in the list. Fix this by adding a scan to validate namespaces greater than the value of prev. This also handles the case of removing namespaces whose value exceed the device's reported number of namespaces. Signed-off-by: Sunad Bhandary S <sunad.s@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-08nvme: use UINT_MAX for max discard sectorsMinfei Huang1-1/+1
It's more elegant to use UINT_MAX to represent the max value of type unsigned int. So replace the actual value by using this define. Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07nvme: move nvme_cancel_request() to common codeMing Lin3-16/+18
So it can be used by fabrics driver also. Signed-off-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.bsuch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07nvme: update and rename nvme_cancel_io to nvme_cancel_requestMing Lin1-4/+4
nvme_cancel_io is a bit confusing (given the distinction of io/admin), so rename it to nvme_cancel_request. And update it a bit to pass in struct nvme_ctrl, so it can be used by Fabrics driver also. Signed-off-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.bsuch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie1-1/+1
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07drivers: use req op accessorMike Christie2-3/+3
The req operation REQ_OP is separated from the rq_flag_bits definition. This converts the block layer drivers to use req_op to get the op from the request struct. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18nvme/host: Add missing blk_integrity tag_size + flags assignmentsNicholas Bellinger1-0/+4
While doing recent bring-up of nvme/host with target-core T10-PI, I noticed /sys/block/nvme*/integrity/device_is_integrity_capable was false, and /sys/block/nvme*/integrity/tag_size contained a bogus value. AFAICT outside of blk_integrity_compare() for DM + MD these are informational values, but go ahead and add the missing assignments for nvme/host to match what SCSI does within sd_dif_config_host() for consistency's sake. Cc: Keith Busch <keith.busch@intel.com> Cc: Jay Freyensee <james.p.freyensee@intel.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Sagi Grimberg <sagig@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Sagi Grimberg <sagi at grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18NVMe: Add device ID's with stripe quirkKeith Busch1-0/+6
Adds two Intel controllers that have the "stripe" quirk. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18NVMe: Short-cut removal on surprise hot-unplugKeith Busch3-0/+23
This patch adds a new state that when set has the core automatically kill request queues prior to removing namespaces. If PCI device is not present at the time the nvme driver's remove is called, we can kill all IO queues immediately instead of waiting for the watchdog thread to do that at its polling interval. This improves scenarios where multiple hot plug events occur at the same time since it doesn't block the pci enumeration for as long. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18NVMe: Allow user initiated rescanKeith Busch1-0/+15
This exposes ioctl and sysfs methods a user can invoke to request the driver rescan a controller and its namespaces. This is less harsh than doing a controller reset, which temporarilly halts all IO, just to surface a newly attached namespace. This is mainly useful for controllers that implement the namespace management command, but do not support the namespace notify change asynchronous event notification. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18NVMe: Reduce driver log spammingKeith Busch1-1/+4
Reduce error logging when no corrective action is required. Suggessted-by: Chris Petersen <cpetersen@fb.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18NVMe: Unbind driver on failureKeith Busch1-1/+1
Instead of removing the PCI device from the kernel's topology on controller failure, this patch simply requests unbinding the device from the driver. This avoids concurrently running pci removal with the hot plug event, which has been reported to be problematic when multiple surprise events occur near simultaneously. The other benefit is that we will have PCI config and memory space available to poke around for debugging a failed controller, assuming the device was not physically removed. The down side occurs if the platform and/or kernel do not support any type of surprise hot removal. The device will remain visible through sysfs (and therefore lspci), and some manual work is necessary to get the logical topology corrected. But if your platform and/or kernel don't support surprise removal, you probably shouldn't be doing that anyway. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-18NVMe: Delete only created queuesKeith Busch1-2/+2
Use the online queue count instead of the number of allocated queues. The controller should just return an invalid queue identifier error to the commands if a queue wasn't created. While it's not harmful, it's still not correct. Reported-by: Saar Gross <saar@annapurnalabs.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>