nvme-fabrics: reject I/O to offline device

Commands get stuck while Host NVMe-oF controller is in reconnect state. The controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until a successful reconnect or until the reconnect time-out is reached. The default reconnect time out is 10 minutes. Applications are expecting commands to complete with success or error within a certain timeout (30 seconds by default). The NVMe host is enforcing that timeout while it is connected, but during reconnect the timeout is not enforced and commands may get stuck for a long period or even forever. To fix this long delay due to the default timeout, introduce new "fast_io_fail_tmo" session parameter. The timeout is measured in seconds from the controller reconnect and any command beyond that timeout is rejected. The new parameter value may be passed during 'connect'. The default value of -1 means no timeout (similar to current behavior). Signed-off-by: Victor Gladkov <victor.gladkov@kioxia.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
author: Victor Gladkov <Victor.Gladkov@kioxia.com> 2020-11-24 21:34:59 +0300
committer: Christoph Hellwig <hch@lst.de> 2020-12-01 22:36:37 +0300
commit: 8c4dfea97f15b80097b3f882ca428fb2751ec30c (patch)
tree: 122bcd90a4fb7504e3fc3b2b82bab80184cc7496 /drivers/nvme/host/fabrics.h
parent: 9f20599c4821d1f7281a3efb3ef94ff3cfdd5e10 (diff)
download: linux-8c4dfea97f15b80097b3f882ca428fb2751ec30c.tar.xz
1 files changed, 5 insertions, 0 deletions
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a9c1e3b4585e..733010d2eafd 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -15,6 +15,8 @@
 #define NVMF_DEF_RECONNECT_DELAY	10
 /* default to 600 seconds of reconnect attempts before giving up */
 #define NVMF_DEF_CTRL_LOSS_TMO		600
+/* default is -1: the fail fast mechanism is disabled  */
+#define NVMF_DEF_FAIL_FAST_TMO		-1
 
 /*
  * Define a host as seen by the target.  We allocate one at boot, but also
@@ -56,6 +58,7 @@ enum {
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
 	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 	NVMF_OPT_TOS		= 1 << 19,
+	NVMF_OPT_FAIL_FAST_TMO	= 1 << 20,
 };
 
 /**
@@ -89,6 +92,7 @@ enum {
  * @nr_write_queues: number of queues for write I/O
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
+ * @fast_io_fail_tmo: Fast I/O fail timeout in seconds
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -111,6 +115,7 @@ struct nvmf_ctrl_options {
 	unsigned int		nr_write_queues;
 	unsigned int		nr_poll_queues;
 	int			tos;
+	int			fast_io_fail_tmo;
 };
 
 /*
author	Victor Gladkov <Victor.Gladkov@kioxia.com>	2020-11-24 21:34:59 +0300
committer	Christoph Hellwig <hch@lst.de>	2020-12-01 22:36:37 +0300
commit	8c4dfea97f15b80097b3f882ca428fb2751ec30c (patch)
tree	122bcd90a4fb7504e3fc3b2b82bab80184cc7496 /drivers/nvme/host/fabrics.h
parent	9f20599c4821d1f7281a3efb3ef94ff3cfdd5e10 (diff)
download	linux-8c4dfea97f15b80097b3f882ca428fb2751ec30c.tar.xz