kernel/linux.git/drivers/cxl, branch v6.12.80

cxl/hdm: Avoid incorrect DVSEC fallback when HDM decoders are enabled

2026-04-02T11:09:25+00:00

[ Upstream commit 75cea0776de502f2a1be5ca02d37c586dc81887e ] Check the global CXL_HDM_DECODER_ENABLE bit instead of looping over per-decoder COMMITTED bits to determine whether to fall back to DVSEC range emulation. When the HDM decoder capability is globally enabled, ignore DVSEC range registers regardless of individual decoder commit state. should_emulate_decoders() currently loops over per-decoder COMMITTED bits, which leads to an incorrect DVSEC fallback when those bits are zero. One way to trigger this is to destroy a region and bounce the memdev: cxl disable-region region0 cxl destroy-region region0 cxl disable-memdev mem0 cxl enable-memdev mem0 Region teardown zeroes the HDM decoder registers including the committed bits. The subsequent memdev re-probe finds uncommitted decoders and falls back to DVSEC emulation, even though HDM remains globally enabled. Observed failures: should_emulate_decoders: cxl_port endpoint6: decoder6.0: committed: 0 base: 0x0_00000000 size: 0x0_00000000 devm_cxl_setup_hdm: cxl_port endpoint6: Fallback map 1 range register .. devm_cxl_add_region: cxl_acpi ACPI0017:00: decoder0.0: created region0 __construct_region: cxl_pci 0000:e1:00.0: mem1:decoder6.0: __construct_region region0 res: [mem 0x850000000-0x284fffffff flags 0x200] iw: 1 ig: 4096 cxl region0: pci0000:e0:port1 cxl_port_setup_targets expected iw: 1 ig: 4096 .. cxl region0: pci0000:e0:port1 cxl_port_setup_targets got iw: 1 ig: 256 state: disabled .. cxl_port endpoint6: failed to attach decoder6.0 to region0: -6 .. devm_cxl_add_region: cxl_acpi ACPI0017:00: decoder0.0: created region4 alloc_hpa: cxl region4: HPA allocation error (-34) .. Fixes: 52cc48ad2a76 ("cxl/hdm: Limit emulation to the number of range registers") Signed-off-by: Smita Koralahalli Reviewed-by: Dan Williams Link: https://patch.msgid.link/20260316201950.224567-1-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin

cxl/port: Fix use after free of parent_port in cxl_detach_ep()

2026-04-02T11:09:24+00:00

[ Upstream commit 19d2f0b97a131198efc2c4ca3eb7f980bba8c2b4 ] cxl_detach_ep() is called during bottom-up removal when all CXL memory devices beneath a switch port have been removed. For each port in the hierarchy it locks both the port and its parent, removes the endpoint, and if the port is now empty, marks it dead and unregisters the port by calling delete_switch_port(). There are two places during this work where the parent_port may be used after freeing: First, a concurrent detach may have already processed a port by the time a second worker finds it via bus_find_device(). Without pinning parent_port, it may already be freed when we discover port->dead and attempt to unlock the parent_port. In a production kernel that's a silent memory corruption, with lock debug, it looks like this: []DEBUG_LOCKS_WARN_ON(__owner_task(owner) != get_current()) []WARNING: kernel/locking/mutex.c:949 at __mutex_unlock_slowpath+0x1ee/0x310 []Call Trace: []mutex_unlock+0xd/0x20 []cxl_detach_ep+0x180/0x400 [cxl_core] []devm_action_release+0x10/0x20 []devres_release_all+0xa8/0xe0 []device_unbind_cleanup+0xd/0xa0 []really_probe+0x1a6/0x3e0 Second, delete_switch_port() releases three devm actions registered against parent_port. The last of those is unregister_port() and it calls device_unregister() on the child port, which can cascade. If parent_port is now also empty the device core may unregister and free it too. So by the time delete_switch_port() returns, parent_port may be free, and the subsequent device_unlock(&parent_port->dev) operates on freed memory. The kernel log looks same as above, with a different offset in cxl_detach_ep(). Both of these issues stem from the absence of a lifetime guarantee between a child port and its parent port. Establish a lifetime rule for ports: child ports hold a reference to their parent device until release. Take the reference when the port is allocated and drop it when released. This ensures the parent is valid for the full lifetime of the child and eliminates the use after free window in cxl_detach_ep(). This is easily reproduced with a reload of cxl_acpi in QEMU with CXL devices present. Fixes: 2345df54249c ("cxl/memdev: Fix endpoint port removal") Reviewed-by: Dave Jiang Reviewed-by: Li Ming Signed-off-by: Alison Schofield Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20260226184439.1732841-1-alison.schofield@intel.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin

cxl: Fix premature commit_end increment on decoder commit failure

2026-03-04T12:20:22+00:00

[ Upstream commit 7b6f9d9b1ea05c9c22570126547c780e8c6c3f62 ] In cxl_decoder_commit(), commit_end is incremented before verifying whether the commit succeeded, and the CXL_DECODER_F_ENABLE bit in cxld->flags is only set after a successful commit. As a result, if the commit fails, commit_end has been incremented and cxld->reset() has no effect since the flag is not set, so commit_end remains incorrectly incremented. The inconsistency between commit_end and CXL_DECODER_F_ENABLE causes failure during subsequent either commit or reset operations. Fix this by incrementing commit_end only after confirming the commit succeeded. Also, remove the ineffective cxld->reset() call. According to CXL Spec r4.0 8.2.4.20.12 Committing Decoder Programming, since cxld_await_commit() has cleared the decoder commit bit on failure, no additional reset is required. [dj: Fixed commit log 80 char wrapping. ] [dj: Fix "Fixes" tag to correct hash length. ] [dj: Change spec to r4.0. ] Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Yuxiong Wang Acked-by: Huang Ying Reviewed-by: Dave Jiang Reviewed-by: Alison Schofield Link: https://patch.msgid.link/20260129064552.31180-1-yuxiong.wang@linux.alibaba.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin

cxl/region: Add a dev_err() on missing target list entries

2025-07-06T09:01:32+00:00

[ Upstream commit d90acdf49e18029cfe4194475c45ef143657737a ] Broken target lists are hard to discover as the driver fails at a later initialization stage. Add an error message for this. Example log messages: cxl_mem mem1: failed to find endpoint6:0000:e0:01.3 in target list of decoder1.1 cxl_port endpoint6: failed to register decoder6.0: -6 cxl_port endpoint6: probe: 0 Signed-off-by: Robert Richter Reviewed-by: Gregory Price Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Dan Williams Reviewed-by: Alison Schofield Reviewed-by: "Fabio M. De Francesco" Tested-by: Gregory Price Acked-by: Dan Williams Link: https://patch.msgid.link/20250509150700.2817697-14-rrichter@amd.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin

cxl/core/regs.c: Skip Memory Space Enable check for RCD and RCH Ports

2025-05-02T05:59:08+00:00

commit 078d3ee7c162cd66d76171579c02d7890bd77daf upstream. According to CXL r3.2 section 8.2.1.2, the PCI_COMMAND register fields, including Memory Space Enable bit, have no effect on the behavior of an RCD Upstream Port. Retaining this check may incorrectly cause cxl_pci_probe() to fail on a valid RCD upstream Port. While the specification is explicit only for RCD Upstream Ports, this check is solely for accessing the RCRB, which is always mapped through memory space. Therefore, its safe to remove the check entirely. In practice, firmware reliably enables the Memory Space Enable bit for RCH Downstream Ports and no failures have been observed. Removing the check simplifies the code and avoids unnecessary special-casing, while relying on BIOS/firmware to configure devices correctly. Moreover, any failures due to inaccessible RCRB regions will still be caught either in __rcrb_to_component() or while parsing the component register block. The following failure was observed in dmesg when the check was present: cxl_pci 0000:7f:00.0: No component registers (-6) Fixes: d5b1a27143cb ("cxl/acpi: Extract component registers of restricted hosts from RCRB") Signed-off-by: Smita Koralahalli Cc: Reviewed-by: Ira Weiny Reviewed-by: Terry Bowman Reviewed-by: Dave Jiang Reviewed-by: Robert Richter Link: https://patch.msgid.link/20250407192734.70631-1-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang Signed-off-by: Greg Kroah-Hartman

cxl/region: Fix region creation for greater than x2 switches

2024-12-27T13:02:01+00:00

[ Upstream commit 76467a94810c2aa4dd3096903291ac6df30c399e ] The cxl_port_setup_targets() algorithm fails to identify valid target list ordering in the presence of 4-way and above switches resulting in 'cxl create-region' failures of the form: $ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0 cxl region: create_region: region0: failed to set target7 to mem0 cxl region: cmd_create_region: created 0 regions [kernel debug message] check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2 bus_remove_device:574: bus: 'cxl': remove device region0 QEMU can create this failing topology: ACPI0017:00 [root0] | HB_0 [port1] / \ RP_0 RP_1 | | USP [port2] USP [port3] / / \ \ / / \ \ DSP DSP DSP DSP DSP DSP DSP DSP | | | | | | | | mem4 mem6 mem2 mem7 mem1 mem3 mem5 mem0 Pos: 0 2 4 6 1 3 5 7 HB: Host Bridge RP: Root Port USP: Upstream Port DSP: Downstream Port ...with the following command steps: $ qemu-system-x86_64 -machine q35,cxl=on,accel=tcg \ -smp cpus=8 \ -m 8G \ -hda /home/work/vm-images/centos-stream8-02.qcow2 \ -object memory-backend-ram,size=4G,id=m0 \ -object memory-backend-ram,size=4G,id=m1 \ -object memory-backend-ram,size=2G,id=cxl-mem0 \ -object memory-backend-ram,size=2G,id=cxl-mem1 \ -object memory-backend-ram,size=2G,id=cxl-mem2 \ -object memory-backend-ram,size=2G,id=cxl-mem3 \ -object memory-backend-ram,size=2G,id=cxl-mem4 \ -object memory-backend-ram,size=2G,id=cxl-mem5 \ -object memory-backend-ram,size=2G,id=cxl-mem6 \ -object memory-backend-ram,size=2G,id=cxl-mem7 \ -numa node,memdev=m0,cpus=0-3,nodeid=0 \ -numa node,memdev=m1,cpus=4-7,nodeid=1 \ -netdev user,id=net0,hostfwd=tcp::2222-:22 \ -device virtio-net-pci,netdev=net0 \ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ -device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \ -device cxl-upstream,bus=root_port0,id=us0 \ -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ -device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,id=cxl-vmem0 \ -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ -device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,id=cxl-vmem1 \ -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ -device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,id=cxl-vmem2 \ -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ -device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,id=cxl-vmem3 \ -device cxl-upstream,bus=root_port1,id=us1 \ -device cxl-downstream,port=4,bus=us1,id=swport4,chassis=0,slot=8 \ -device cxl-type3,bus=swport4,volatile-memdev=cxl-mem4,id=cxl-vmem4 \ -device cxl-downstream,port=5,bus=us1,id=swport5,chassis=0,slot=9 \ -device cxl-type3,bus=swport5,volatile-memdev=cxl-mem5,id=cxl-vmem5 \ -device cxl-downstream,port=6,bus=us1,id=swport6,chassis=0,slot=10 \ -device cxl-type3,bus=swport6,volatile-memdev=cxl-mem6,id=cxl-vmem6 \ -device cxl-downstream,port=7,bus=us1,id=swport7,chassis=0,slot=11 \ -device cxl-type3,bus=swport7,volatile-memdev=cxl-mem7,id=cxl-vmem7 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=32G & In Guest OS: $ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0 Fix the method to calculate @distance by iterativeley multiplying the number of targets per switch port. This also follows the algorithm recommended here [1]. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Link: http://lore.kernel.org/6538824b52349_7258329466@dwillia2-xfh.jf.intel.com.notmuch [1] Signed-off-by: Huaisheng Ye Tested-by: Li Zhijian [djbw: add a comment explaining 'distance'] Signed-off-by: Dan Williams Link: https://patch.msgid.link/173378716722.1270362.9546805175813426729.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin

cxl/pci: Fix potential bogus return value upon successful probing

2024-12-27T13:02:01+00:00

[ Upstream commit da4d8c83358163df9a4addaeba0ef8bcb03b22e8 ] If cxl_pci_ras_unmask() returns non-zero, cxl_pci_probe() will end up returning that value, instead of zero. Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL") Reviewed-by: Fan Ni Signed-off-by: Davidlohr Bueso Reviewed-by: Ira Weiny Link: https://patch.msgid.link/20241115170032.108445-1-dave@stgolabs.net Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin

cxl/port: Prevent out-of-order decoder allocation

2024-10-25T21:07:03+00:00

With the recent change to allow out-of-order decoder de-commit it highlights a need to strengthen the in-order decoder commit guarantees. As it stands match_free_decoder() ensures that if 2 regions are racing decoder allocations the one that wins the race will get the lower id decoder, but that still leaves the race to *commit* the decoder. Rather than have this complicated case of "reserved in-order, but may still commit out-of-order", just arrange for the reservation order to match the commit-order. In other words, prevent subsequent allocations until the last reservation is committed. This precludes overlapping region creation events and requires the previous regionN to either move forward to the decoder commit stage or drop its reservation before regionN+1 can move forward. That is, provided that regionN and regionN+1 decode through the same switch port. As a side effect this allows match_free_decoder() to drop its dependency on needing write access to the device_find_child() @data parameter [1]. Reported-by: Zijun Hu Closes: http://lore.kernel.org/20240905-const_dfc_prepare-v4-0-4180e1d5a244@quicinc.com Cc: Davidlohr Bueso Cc: Vishal Verma Cc: Alison Schofield Cc: Jonathan Cameron Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Reviewed-by: Ira Weiny Link: https://patch.msgid.link/172964783668.81806.14962699553881333486.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny

cxl/port: Fix use-after-free, permit out-of-order decoder shutdown

2024-10-25T21:07:03+00:00

In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron Cc: Greg Kroah-Hartman Cc: Davidlohr Bueso Cc: Dave Jiang Cc: Alison Schofield Cc: Ira Weiny Cc: Zijun Hu Signed-off-by: Dan Williams Reviewed-by: Ira Weiny Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny

cxl/acpi: Ensure ports ready at cxl_acpi_probe() return

2024-10-25T21:07:03+00:00

In order to ensure root CXL ports are enabled upon cxl_acpi_probe() when the 'cxl_port' driver is built as a module, arrange for the module to be pre-loaded or built-in. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams Tested-by: Gregory Price Reviewed-by: Jonathan Cameron Reviewed-by: Ira Weiny Link: https://patch.msgid.link/172964781969.81806.17276352414854540808.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny