summaryrefslogtreecommitdiff
path: root/include/linux/mlx4/cmd.h
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2015-01-26 01:43:19 +0300
committerDavid S. Miller <davem@davemloft.net>2015-01-26 01:43:19 +0300
commitbc579ae5f902e7a17d4a02ca32779c90604d57b9 (patch)
tree29dcfdc9bb9636ee93c8940a1b355c26a30c01b1 /include/linux/mlx4/cmd.h
parent7aee42c6764bae75d0eb2f674f0874193de90c05 (diff)
parent0cd9302734111abc0b5912b695336f2ee63cb22b (diff)
downloadlinux-bc579ae5f902e7a17d4a02ca32779c90604d57b9.tar.xz
Merge branch 'mlx4-next'
Or Gerlitz says: ==================== mlx4: Fix and enhance the device reset flow This series from Yishai Hadas fixes the device reset flow and adds SRIOV support. Reset flows are required whenever a device experiences errors, is unresponsive, or is not in a deterministic state. In such cases, the driver is expected to reset the HW and continue operation. When SRIOV is enabled, these requirements apply both to PF and VF devices. Currently, the mlx4 reset flow doesn't work properly: when a fatal error is detected on the FW internal buffer the chip is not reset and stays in its bad state. There are cases that assumed to be fatal such as non-responsive FW, errors via closing commands but are not handled today. The AER mechanism should also be fixed: - It should use mlx4_load_one instead of __mlx4_init_one which is done upon HCA probing. - It must be aligned with concurrent catas flow, mark device to be in an error state, reset chip, etc. - Port types should be restored to their original values before error occurred. In addition, there the SRIOV use-case isn't supported. In above cases when the device state becomes fatal we must act as follows: 1) Reset the chip and mark the HW device state as in fatal error. 2) Wake up any pending commands, preventing new ones to come in. 3) Restart the software stack. We also address the SRIOV mode as follows: In case the PF detects a fatal error, it lets VFs know about that, then both itself and VFs are restarted asynchronously. However, in case only the VF encountered a fatal case or forced to be reset, they reset the VF stuff and then restart software. changes from V0: No need to call pci_disable_device upon permanent PCI error. This will be done as part of mlx4_remove_one which is called later once we return PCI_ERS_RESULT_DISCONNECT from the pci error handler. Initial toggle value should use only the T bit and not the whole byte value. Not doing so sometimes broke SRIOV as of junky value seen by the VF as a non-ready comm channel ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/linux/mlx4/cmd.h')
-rw-r--r--include/linux/mlx4/cmd.h3
1 files changed, 3 insertions, 0 deletions
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 64d25941b329..c989442ffc6a 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -279,6 +279,8 @@ int mlx4_get_vf_config(struct mlx4_dev *dev, int port, int vf, struct ifla_vf_in
int mlx4_set_vf_link_state(struct mlx4_dev *dev, int port, int vf, int link_state);
int mlx4_config_dev_retrieval(struct mlx4_dev *dev,
struct mlx4_config_dev_params *params);
+void mlx4_cmd_wake_completions(struct mlx4_dev *dev);
+void mlx4_report_internal_err_comm_event(struct mlx4_dev *dev);
/*
* mlx4_get_slave_default_vlan -
* return true if VST ( default vlan)
@@ -288,5 +290,6 @@ bool mlx4_get_slave_default_vlan(struct mlx4_dev *dev, int port, int slave,
u16 *vlan, u8 *qos);
#define MLX4_COMM_GET_IF_REV(cmd_chan_ver) (u8)((cmd_chan_ver) >> 8)
+#define COMM_CHAN_EVENT_INTERNAL_ERR (1 << 17)
#endif /* MLX4_CMD_H */