cxl/pmem: Fix module reload vs workqueue state

A test of the form: while true; do modprobe -r cxl_pmem; modprobe cxl_pmem; done May lead to a crash signature of the form: BUG: unable to handle page fault for address: ffffffffc0660030 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page [..] Workqueue: cxl_pmem 0xffffffffc0660030 RIP: 0010:0xffffffffc0660030 Code: Unable to access opcode bytes at RIP 0xffffffffc0660006. [..] Call Trace: ? process_one_work+0x4ec/0x9c0 ? pwq_dec_nr_in_flight+0x100/0x100 ? rwlock_bug.part.0+0x60/0x60 ? worker_thread+0x2eb/0x700 In that report the 0xffffffffc0660030 address corresponds to the former function address of cxl_nvb_update_state() from a previous load of the module, not the current address. Fix that by arranging for ->state_work in the 'struct cxl_nvdimm_bridge' object to be reinitialized on cxl_pmem module reload. Details: Recall that CXL subsystem wants to link a CXL memory expander device to an NVDIMM sub-hierarchy when both a persistent memory range has been registered by the CXL platform driver (cxl_acpi) *and* when that CXL memory expander has published persistent memory capacity (Get Partition Info). To this end the cxl_nvdimm_bridge driver arranges to rescan the CXL bus when either of those conditions change. The helper bus_rescan_devices() can not be called underneath the device_lock() for any device on that bus, so the cxl_nvdimm_bridge driver uses a workqueue for the rescan. Typically a driver allocates driver data to hold a 'struct work_struct' for a driven device, but for a workqueue that may run after ->remove() returns, driver data will have been freed. The 'struct cxl_nvdimm_bridge' object holds the state and work_struct directly. Unfortunately it was only arranging for that infrastructure to be initialized once per device creation rather than the necessary once per workqueue (cxl_pmem_wq) creation. Introduce is_cxl_nvdimm_bridge() and cxl_nvdimm_bridge_reset() in support of invalidating stale references to a recently destroyed cxl_pmem_wq. Cc: <stable@vger.kernel.org> Fixes: 8fdcb1704f61 ("cxl/pmem: Add initial infrastructure for pmem support") Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/163665474585.3505991.8397182770066720755.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
author: Dan Williams <dan.j.williams@intel.com> 2021-11-11 21:19:05 +0300
committer: Dan Williams <dan.j.williams@intel.com> 2021-11-15 22:03:00 +0300
commit: 53989fad1286e652ea3655ae3367ba698da8d2ff (patch)
tree: bfba1e239c0049c3ce27af74f55bce32b3621c94 /drivers/cxl/cxl.h
parent: fd49f99c180996cef2d707ad71bee4f060dbe367 (diff)
download: linux-53989fad1286e652ea3655ae3367ba698da8d2ff.tar.xz
1 files changed, 8 insertions, 0 deletions
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3af704e9b448..ab4596f0b751 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -196,6 +196,13 @@ struct cxl_decoder {
 };
 
 
+/**
+ * enum cxl_nvdimm_brige_state - state machine for managing bus rescans
+ * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed
+ * @CXL_NVB_DEAD: Set at brige unregistration to preclude async probing
+ * @CXL_NVB_ONLINE: Target state after successful ->probe()
+ * @CXL_NVB_OFFLINE: Target state after ->remove() or failed ->probe()
+ */
 enum cxl_nvdimm_brige_state {
 	CXL_NVB_NEW,
 	CXL_NVB_DEAD,
@@ -308,6 +315,7 @@ struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host,
 						     struct cxl_port *port);
 struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev);
 bool is_cxl_nvdimm(struct device *dev);
+bool is_cxl_nvdimm_bridge(struct device *dev);
 int devm_cxl_add_nvdimm(struct device *host, struct cxl_memdev *cxlmd);
 struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd);
author	Dan Williams <dan.j.williams@intel.com>	2021-11-11 21:19:05 +0300
committer	Dan Williams <dan.j.williams@intel.com>	2021-11-15 22:03:00 +0300
commit	53989fad1286e652ea3655ae3367ba698da8d2ff (patch)
tree	bfba1e239c0049c3ce27af74f55bce32b3621c94 /drivers/cxl/cxl.h
parent	fd49f99c180996cef2d707ad71bee4f060dbe367 (diff)
download	linux-53989fad1286e652ea3655ae3367ba698da8d2ff.tar.xz