From 18f87a67e6d681d1c6f8b3c47985f21b25959a77 Mon Sep 17 00:00:00 2001 From: Martin Peschke Date: Thu, 13 Nov 2014 14:59:48 +0100 Subject: zfcp: auto port scan resiliency This patch improves the Fibre Channel port scan behaviour of the zfcp lldd. Without it the zfcp device driver may churn up the storage area network by excessive scanning and scan bursts, particularly in big virtual server environments, potentially resulting in interference of virtual servers and reduced availability of storage connectivity. The two main issues as to the zfcp device drivers automatic port scan in virtual server environments are frequency and simultaneity. On the one hand, there is no point in allowing lots of ports scans in a row. It makes sense, though, to make sure that a scan is conducted eventually if there has been any indication for potential SAN changes. On the other hand, lots of virtual servers receiving the same indication for a SAN change had better not attempt to conduct a scan instantly, that is, at the same time. Hence this patch has a two-fold approach for better port scanning: the introduction of a rate limit to amend frequency issues, and the introduction of a short random backoff to amend simultaneity issues. Both approaches boil down to deferred port scans, with delays comprising parts for both approaches. The new port scan behaviour is summarised best by: NEW: NEW: no_auto_port_rescan random rate flush backoff limit =wait adapter resume/thaw yes yes no yes* adapter online (user) no yes no yes* port rescan (user) no no no yes adapter recovery (user) yes yes yes no adapter recovery (other) yes yes yes no incoming ELS yes yes yes no incoming ELS lost yes yes yes no Implementation is straight-forward by converting an existing worker to a delayed worker. But care is needed whenever that worker is going to be flushed (in order to make sure work has been completed), since a flush operation cancels the timer set up for deferred execution (see * above). There is a small race window whenever a port scan work starts running up to the point in time of storing the time stamp for that port scan. The impact is negligible. Closing that gap isn't trivial, though, and would the destroy the beauty of a simple work-to-delayed-work conversion. Signed-off-by: Martin Peschke Signed-off-by: Steffen Maier Reviewed-by: Hannes Reinecke Signed-off-by: Christoph Hellwig --- drivers/s390/scsi/zfcp_aux.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'drivers/s390/scsi/zfcp_aux.c') diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c index 8004b071a9f2..01a73395a017 100644 --- a/drivers/s390/scsi/zfcp_aux.c +++ b/drivers/s390/scsi/zfcp_aux.c @@ -353,9 +353,11 @@ struct zfcp_adapter *zfcp_adapter_enqueue(struct ccw_device *ccw_device) adapter->ccw_device = ccw_device; INIT_WORK(&adapter->stat_work, _zfcp_status_read_scheduler); - INIT_WORK(&adapter->scan_work, zfcp_fc_scan_ports); + INIT_DELAYED_WORK(&adapter->scan_work, zfcp_fc_scan_ports); INIT_WORK(&adapter->ns_up_work, zfcp_fc_sym_name_update); + adapter->next_port_scan = jiffies; + if (zfcp_qdio_setup(adapter)) goto failed; @@ -420,7 +422,7 @@ void zfcp_adapter_unregister(struct zfcp_adapter *adapter) { struct ccw_device *cdev = adapter->ccw_device; - cancel_work_sync(&adapter->scan_work); + cancel_delayed_work_sync(&adapter->scan_work); cancel_work_sync(&adapter->stat_work); cancel_work_sync(&adapter->ns_up_work); zfcp_destroy_adapter_work_queue(adapter); -- cgit v1.2.3