Age | Commit message (Collapse) | Author | Files | Lines |
|
Add support to send CPER records to CXL for more detailed parsing.
|
|
If the firmware has configured CXL event support to be firmware first
the OS will receive those events through CPER records. The CXL layer has
unique DPA to HPA knowledge and existing event trace parsing in
place.[0]
Add a CXL CPER work item and register it with the GHES code to process
CPER events.
Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0]
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-2-58076cce1624@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
A recent bugfix to cxl_pmem_region_alloc() to fix an
error-unwind-memleak [1], highlighted a use case for scope-based resource
management.
Delete the goto for releasing @cxl_region_rwsem, and return error codes
directly from error condition paths.
The caller, devm_cxl_add_pmem_region(), is no longer given @cxlr_pmem
directly it must retrieve it from @cxlr->cxlr_pmem. This retrieval from
@cxlr was already in place for @cxlr->cxl_nvb, and converting
cxl_pmem_region_alloc() to return an int makes it less awkward to handle
no_free_ptr().
Cc: Li Zhijian <lizhijian@fujitsu.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20240430174540.000039ce@Huawei.com
Link: http://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/171451430965.1147997.15782562063090960666.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
As a follow on to the recent rework of __cxl_parse_cfmws() to always
return errors [1], use cleanup.h helpers to remove goto and other cleanups
now that logging is moved to the cxl_parse_cfmws() wrapper.
This ends up adding more code than it deletes, but __cxl_parse_cfmws()
itself does get smaller. The takeaway from the cond_no_free_ptr()
discussion [2] was to not add new macros to handle the cases where
no_free_ptr() is awkward, instead rework the code to have helpers and
clearer delineation of responsibility.
Now one might say that __free(del_cxl_resource) is excessive given it
is immediately registered with add_or_reset_cxl_resource(). The
rationale for keeping it is that it forces use of "no_free_ptr()" on the
argument passed to add_or_reset_cxl_resource(). That in turn makes it
clear that @res is NULL for the rest of the function which is part of
the point of the cleanup helpers, to turn subtle use after free errors
[3] into loud NULL pointer de-references.
Link: http://lore.kernel.org/r/170820177238.631006.1012639681618409284.stgit@dwillia2-xfh.jf.intel.com [1]
Link: http://lore.kernel.org/r/CAHk-=whBVhnh=KSeBBRet=E7qJAwnPR_aj5em187Q3FiD+LXnA@mail.gmail.com [2]
Link: http://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org [3]
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20240219124041.00002bda@Huawei.com
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/171235474028.2718248.14109646123143505522.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Before this error path, cxlr_pmem pointed to a kzalloc() memory, free
it to avoid this memory leaking.
Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Support for HPA to DPA translation for CXL events cxl_dram and
cxl_general_media.
|
|
User space may need to know which region, if any, maps the DPAs
(device physical addresses) reported in a cxl_general_media or
cxl_dram event. Since the mapping can change, the kernel provides
this information at the time the event occurs. This informs user
space that at event <timestamp> this <region> mapped this <DPA>
to this <HPA>.
Add the same region info that is included in the cxl_poison trace
event: the DPA->HPA translation, region name, and region uuid.
The new fields are inserted in the trace event and no existing
fields are modified. If the DPA is not mapped, user will see:
hpa=ULLONG_MAX, region="", and uuid=0
This work must be protected by dpa_rwsem & region_rwsem since
it is looking up region mappings.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
This work belongs in the region driver as it is only useful with
CONFIG_CXL_REGION. Add a stub in core.h for when the region driver
is not built.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
This helper belongs in the region driver as it is only useful
with CONFIG_CXL_REGION. Add a stub in core.h for when the region
driver is not built.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The length of Physical Address in General Media and DRAM event
records is 64-bit, so the field mask for extracting the DPA should
be 64-bit also, otherwise the trace event reports DPA's with the
upper 32 bits of a DPA address masked off. If users do DPA-to-HPA
translations this could lead to incorrect page retirement decisions.
Use GENMASK_ULL() for CXL_DPA_MASK to get all the DPA address bits.
Tidy up CXL_DPA_FLAGS_MASK by using GENMASK() to only mask the exact
flag bits.
These bits are defined as part of the event record physical address
descriptions of General Media and DRAM events in CXL Spec 3.1
Section 8.2.9.2 Events.
Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
Co-developed-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/2867fc43c57720a4a15a3179431829b8dbd2dc16.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add CXL log related mailbox commands
- Add Get Log Capabilities command
- Add Get Supported Log Sub-List Commands command
- Add Clear Log command
|
|
The decoder enum has a name conversion function defined now.
Use that instead of open coding.
Suggested-by: Navneet Singh <navneet.singh@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-1-f740c47e7916@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The CXL driver uses both functions phys_to_target_node() and
memory_add_physaddr_to_nid(). The x86 architecture relies on the
NUMA_KEEP_MEMINFO kernel option enabled for both functions to work
correct. Update Kconfig to make sure the option is always enabled for
the driver.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Link: http://lore.kernel.org/r/65f8b191c0422_aa222941b@dwillia2-mobl3.amr.corp.intel.com.notmuch
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20240424154756.2152614-1-rrichter@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
A mixed mode decoder is programmed with device physical addresses
that span both ram and pmem partitions of a memdev.
Linux does not support mixed mode decoders. The driver rejects
sysfs writes that try to set decoder mode to mixed, and if a
resource bieng allocated is not wholly contained in either the
pmem or ram partition of a memdev, it is also rejected. Basically,
the CXL region driver is not going to create regions with mixed
mode decoders, but the BIOS could.
If the kernel driver sees the mixed mode decoder, it will fail to
enable the region, and emit a dev_dbg() message.
A dev_dbg() is not noisy enough in this case. Change the message
to be a dev_warn() that explicitly says mixed mode is not supported.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230218013834.31237-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
There's no debug message for invalid interleave granularity. This
makes it hard to debug related bugs. So, this is added in this patch.
Signed-off-by: Huang, Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240402061016.388408-1-ying.huang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Jonathan reported he has observed compiler warning when using running with
W=1 C=1 for cxl_security_ops that is declared as an extern in cxl/pmem.c.
Move to cxl.h to make it visible to all cxl sources.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/linux-cxl/167771196186.3285982.18283746206612049722.stgit@djiang5-mobl3.local/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Adding UAPI support for CXL r3.1 8.2.9.5.4
Clear Log command.
This proposed patch will be useful for clearing and populating
the Vendor debug log in certain scenarios, allowing for the
aggregation of results over time.
Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240313071218.729-3-sthanneeru.opensrc@micron.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Adding UAPI support for
1. CXL r3.1 8.2.9.5.3 Get Log Capabilities.
2. CXL r3.1 8.2.9.5.6 Get Supported Logs Sub-List.
Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240313071218.729-2-sthanneeru.opensrc@micron.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
[ 39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
[ 39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]
... plus some related subsequent NULL pointer dereference:
[ 40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8
The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.
Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.
Reported-by: Robert Richter <rrichter@amd.com>
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
A recent change to cxl_mem_get_records_log() [1] highlighted a subtle
nuance of looping calls to cxl_internal_send_cmd(), i.e. that
cxl_internal_send_cmd() modifies the 'size_out' member of the @mbox_cmd
argument. That mechanism is useful for communicating underflow, but it
is unwanted when reusing @mbox_cmd for a subsequent submission. It turns
out that cxl_xfer_log() avoids this scenario by always redefining
@mbox_cmd each iteration.
Update cxl_mem_get_records_log() and cxl_mem_get_poison() to follow the
same style as cxl_xfer_log(), i.e. re-define @mbox_cmd each iteration.
The cxl_mem_get_records_log() change is just a style fixup, but the
cxl_mem_get_poison() change is a potential fix, per Alison [2]:
Poison list retrieval can hit this case if the MORE flag is set and
a follow on read of the list delivers more records than the previous
read. ie. device gives one record, sets the _MORE flag, then gives 5.
Not an urgent fix since this behavior has not been seen in the wild,
but worth tracking as a fix.
Cc: Kwangjin Ko <kwangjin.ko@sk.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Fixes: ed83f7ca398b ("cxl/mbox: Add GET_POISON_LIST mailbox command")
Link: http://lore.kernel.org/r/20240402081404.1106-2-kwangjin.ko@sk.com [1]
Link: http://lore.kernel.org/r/ZhAhAL/GOaWFrauw@aschofie-mobl2 [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/171235441633.2716581.12330082428680958635.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:
- Fix index of Clear Event Record handles in cxl_clear_event_record()
- Fix use before init of map->reg_type in cxl_decode_regblock()
- Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log()
- Fix CXL path access_coordinate computation:
- Remove unneded check of iter in loop
- Fix of retrieving of access_coordinate in PCI topology walk
- Fix of incorrect region access_coordinate data calculation
- Consolidate of access_coordinates attached to downstream port
context
- Add check to validate access_coordinate validity to prevent
incorrect data being exposed via sysfs
* tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add checks to access_coordinate calculation to fail missing data
cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
cxl: Fix incorrect region perf data calculation
cxl: Fix retrieving of access_coordinates in PCIe path
cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
cxl/core: Fix initialization of mbox_cmd.size_out in get event
cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
cxl/mem: Fix for the index of Clear Event Record Handle
|
|
Jonathan noted that when the coordinates for host bridge and switches
can be 0s if no actual data are retrieved and the calculation continues.
The resulting number would be inaccurate. Add checks to ensure that the
calculation would complete only if the numbers are valid.
While not seen in the wild, issue may show up with a BIOS that reported
CXL root ports via Generic Ports (via a PCI handle in the SRAT entry).
Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-6-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The driver stores access_coordinate for host bridge in ->hb_coord and
switch CDAT access_coordinate in ->sw_coord. Since neither of these
access_coordinate clobber each other, the variable name can be consolidated
into ->coord to simplify the code.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-5-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Current math in cxl_region_perf_data_calculate divides the latency by 1000
every time the function gets called. This causes the region latency to be
divided by 1000 per memory device and the math is incorrect. This is user
visible as the latency access_coordinate exposed via sysfs will show
incorrect latency data.
Normalize values from CDAT to nanoseconds. Adjust sub-nanoseconds latency
to at least 1. Remove adjustment of perf numbers from the generic target
since hmat handling code has already normalized those numbers. Now all
computation and stored numbers should be in nanoseconds.
cxl_hb_get_perf_coordinates() is removed and HB coords are calculated
in the port access_coordinate calculation path since it no longer need
to be treated special.
Fixes: 3d9f4a197230 ("cxl/region: Calculate performance data for a region")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Current loop in cxl_endpoint_get_perf_coordinates() incorrectly assumes
the Root Port (RP) dport is the one with generic port access_coordinate.
However those coordinates are one level up in the Host Bridge (HB).
Current code causes the computation code to pick up 0s as the coordinates
and cause minimal bandwidth to result in 0.
Add check to skip RP when combining coordinates.
Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The while() loop in cxl_endpoint_get_perf_coordinates() checks to see if
'iter' is valid as part of the condition breaking out of the loop.
is_cxl_root() will stop the loop before the next iteration could go NULL.
Remove the iter check.
The presence of the iter or removing the iter does not impact the behavior
of the code. This is a code clean up and not a bug fix.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-2-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Since mbox_cmd.size_out is overwritten with the actual output size in
the function below, it needs to be initialized every time.
cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
Problem scenario:
1) The size_out variable is initially set to the size of the mailbox.
2) Read an event.
- size_out is set to 160 bytes(header 32B + one event 128B).
- Two event are created while reading.
3) Read the new *two* events.
- size_out is still set to 160 bytes.
- Although the value of out_len is 288 bytes, only 160 bytes are
copied from the mailbox register to the local variable.
- record_count is set to 2.
- Accessing records[1] will result in reading incorrect data.
Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
assigned
In the error path, map->reg_type is being used for kernel warning
before its value is setup. Found by code inspection. Exposure to
user is wrong reg_type being emitted via kernel log. Use a local
var for reg_type and retrieve value for usage.
Fixes: 6c7f4f1e51c2 ("cxl/core/regs: Make cxl_map_{component, device}_regs() device generic")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The dev_dbg info for Clear Event Records mailbox command would report
the handle of the next record to clear not the current one.
This was because the index 'i' had incremented before printing the
current handle value.
Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load")
Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Commit 5d7107c72796 ("perf: CXL Performance Monitoring Unit driver")
added the config entries for CXL_PMU in drivers/cxl/Kconfig and
drivers/perf/Kconfig, so it can be toggled from multiple locations:
[1] Device Drivers
-> PCI support
-> CXL (Compute Expres Link) Devices
-> CXL Performance Monitoring Unit
[2] Device Drivers
-> Performance monitor support
-> CXL Performance Monitoring Unit
This complicates things, and nobody else does this.
I kept the one in drivers/perf/Kconfig because CONFIG_CXL_PMU controls
the compilation of drivers/perf/cxl_pmu.c.
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"Main user visible change:
- User events can now have "multi formats"
The current user events have a single format. If another event is
created with a different format, it will fail to be created. That
is, once an event name is used, it cannot be used again with a
different format. This can cause issues if a library is using an
event and updates its format. An application using the older format
will prevent an application using the new library from registering
its event.
A task could also DOS another application if it knows the event
names, and it creates events with different formats.
The multi-format event is in a different name space from the single
format. Both the event name and its format are the unique
identifier. This will allow two different applications to use the
same user event name but with different payloads.
- Added support to have ftrace_dump_on_oops dump out instances and
not just the main top level tracing buffer.
Other changes:
- Add eventfs_root_inode
Only the root inode has a dentry that is static (never goes away)
and stores it upon creation. There's no reason that the thousands
of other eventfs inodes should have a pointer that never gets set
in its descriptor. Create a eventfs_root_inode desciptor that has a
eventfs_inode descriptor and a dentry pointer, and only the root
inode will use this.
- Added WARN_ON()s in eventfs
There's some conditionals remaining in eventfs that should never be
hit, but instead of removing them, add WARN_ON() around them to
make sure that they are never hit.
- Have saved_cmdlines allocation also include the map_cmdline_to_pid
array
The saved_cmdlines structure allocates a large amount of data to
hold its mappings. Within it, it has three arrays. Two are already
apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
can be saved by also including the map_cmdline_to_pid[] array as
well.
- Restructure __string() and __assign_str() macros used in
TRACE_EVENT()
Dynamic strings in TRACE_EVENT() are declared with:
__string(name, source)
And assigned with:
__assign_str(name, source)
In the tracepoint callback of the event, the __string() is used to
get the size needed to allocate on the ring buffer and
__assign_str() is used to copy the string into the ring buffer.
There's a helper structure that is created in the TRACE_EVENT()
macro logic that will hold the string length and its position in
the ring buffer which is created by __string().
There are several trace events that have a function to create the
string to save. This function is executed twice. Once for
__string() and again for __assign_str(). There's no reason for
this. The helper structure could also save the string it used in
__string() and simply copy that into __assign_str() (it also
already has its length).
By using the structure to store the source string for the
assignment, it means that the second argument to __assign_str() is
no longer needed.
It will be removed in the next merge window, but for now add a
warning if the source string given to __string() is different than
the source string given to __assign_str(), as the source to
__assign_str() isn't even used and will be going away.
- Added checks to make sure that the source of __string() is also the
source of __assign_str() so that it can be safely removed in the
next merge window.
Included fixes that the above check found.
- Other minor clean ups and fixes"
* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
tracing: Add __string_src() helper to help compilers not to get confused
tracing: Use strcmp() in __assign_str() WARN_ON() check
tracepoints: Use WARN() and not WARN_ON() for warnings
tracing: Use div64_u64() instead of do_div()
tracing: Support to dump instance traces by ftrace_dump_on_oops
tracing: Remove second parameter to __assign_rel_str()
tracing: Add warning if string in __assign_str() does not match __string()
tracing: Add __string_len() example
tracing: Remove __assign_str_len()
ftrace: Fix most kernel-doc warnings
tracing: Decrement the snapshot if the snapshot trigger fails to register
tracing: Fix snapshot counter going between two tracers that use it
tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
tracing: Use ? : shortcut in trace macros
tracing: Do not calculate strlen() twice for __string() fields
tracing: Rework __assign_str() and __string() to not duplicate getting the string
cxl/trace: Properly initialize cxl_poison region name
net: hns3: tracing: fix hclgevf trace event strings
drm/i915: Add missing ; to __assign_str() macros in tracepoint code
NFSD: Fix nfsd_clid_class use of __string_len() macro
...
|
|
The TP_STRUCT__entry that gets assigned the region name, or an
empty string if no region is present, is erroneously initialized
to the cxl_region pointer. It needs to be properly initialized
otherwise it's length is wrong and garbage chars can appear in
the kernel trace output: /sys/kernel/tracing/trace
The bad initialization was due in part to a naming conflict with
the parameter: struct cxl_region *region. The field 'region' is
already exposed externally as the region name, so changing that
to something logical, like 'region_name' is not an option. Instead
rename the internal only struct cxl_region to the commonly used
'cxlr'.
Impact is that tooling depending on that trace data can miss
picking up a valid event when searching by region name. The
TP_printk() output, if enabled, does emit the correct region
names in the dmesg log.
This was found during testing of the cxl-list option to report
media-errors for a region.
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: stable@vger.kernel.org
Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pick up a parsing fix for the CDAT SSLBIS structure for v6.9.
|
|
Pick up support for injecting errors via ACPI EINJ into the CXL protocol
for v6.9.
|
|
Pick up support for CXL "HMEM reporting" for v6.9, i.e. build an HMAT
from CXL CDAT and PCIe switch information.
|
|
There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.
If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:
[ 48.691717] Malformed DSMAS table length: (24:0)
[ 48.702084] [CDAT:0x00] Invalid zero length
[ 48.711460] cxl_port endpoint1: Failed to parse CDAT: -22
In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.
Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.
Add a check to warn about a malformed CDAT table length.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Reading the CDAT table using DOE requires a Table Access Response
Header in addition to the CDAT entry. In current implementation this
has caused offsets with sizeof(__le32) to the actual buffers. This led
to hardly readable code and even bugs. E.g., see fix of devm_kfree()
in read_cdat_data():
commit c65efe3685f5 ("cxl/cdat: Free correct buffer on checksum error")
Rework code to avoid calculations with sizeof(__le32). Introduce
struct cdat_doe_rsp for this which contains the Table Access Response
Header and a variable payload size for various data structures
afterwards to access the CDAT table and its CDAT Data Structures
without recalculating buffer offsets.
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Fan Ni <nifan.cxl@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240216155844.406996-3-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Trivial variable rename for the DOE mailbox handle from cdat_doe to
doe_mb. The variable name cdat_doe is too ambiguous, use doe_mb that
is commonly used for the mailbox.
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240216155844.406996-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The 'entry' pointer in cdat_sslbis_handler() is set to header +
sizeof(common header). However, the math missed the addition of the SSLBIS
main header. It should be header + sizeof(common header) + sizeof(*sslbis).
Use a defined struct for all the SSLBIS parts in order to avoid pointer
math errors.
The bug causes incorrect parsing of the SSLBIS table and introduces incorrect
performance values to the access_coordinates during the CXL access_coordinate
calculation path if there are CXL switches present in the topology.
The issue was found during testing of new code being added to add additional
checks for invalid CDAT values during CXL access_coordinate calculation. The
testing was done on qemu with a CXL topology including a CXL switch.
Fixes: 80aa780dda20 ("cxl: Add callback to parse the SSLBIS subtable from CDAT")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20240301210948.1298075-1-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Export CXL helper functions in einj-cxl.c for getting/injecting
available CXL protocol error types to sysfs under kernel/debug/cxl.
The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the einj module. The
kernel/debug/cxl/$dport_dev/einj_inject file is functionally the same
as the error_type and error_inject files provided by the EINJ module,
i.e.: writing an error type into $dport_dev/einj_inject will inject
said error type into the CXL dport represented by $dport_dev.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Link: https://lore.kernel.org/r/20240311142508.31717-4-Benjamin.Cheatham@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
For the numa nodes that are not created by SRAT, no memory_target is
allocated and is not managed by the HMAT_REPORTING code. Therefore
hmat_callback() memory hotplug notifier will exit early on those NUMA
nodes. The CXL memory hotplug notifier will need to call
node_set_perf_attrs() directly in order to setup the access sysfs
attributes.
In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is
stored. Add a helper function acpi_node_backed_by_real_pxm() in order to
check if a NUMA node id is defined by SRAT or created by CFMWS.
node_set_perf_attrs() symbol is exported to allow update of perf attribs
for a node. The sysfs path of
/sys/devices/system/node/nodeX/access0/initiators/* is created by
node_set_perf_attrs() for the various attributes where nodeX is matched
to the NUMA node of the CXL region.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When the CXL region is formed, the driver computes the performance data
for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the access
coordinates to the 'struct memory_target' context kept by the
HMAT_REPORTING code.
Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the
priority number to be called before HMAT_CALLBACK_PRI. The CXL update must
happen before hmat_callback().
A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in
order to allow CXL to update the memory_target access coordinates.
A new ext_updated member is added to the memory_target to indicate that
the access coordinates within the memory_target has been updated by an
external agent such as CXL. This prevents data being overwritten by the
hmat_update_target_attrs() triggered by hmat_callback().
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Huang, Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
region. The bandwidth is the aggregated bandwidth of all devices that
contribute to the CXL region. The latency is the worst latency of the
device amongst all the devices that contribute to the CXL region.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-11-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.
The perf list is expected to be constructed before the endpoint decoders
are registered and thus there should be no early reading of the entries
from the region assemble action. The calling of the region qos calculate
function is under the protection of cxl_dpa_rwsem and will ensure that
all DPA associated work has completed.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Move setting of cxlmd->endpoint to before calling add_device() on the port
device. Otherwise when referencing cxlmd->endpoint in region discovery code
that is triggered by the port driver probe function, the endpoint port
pointer is not valid.
Current code does not hit this issue yet since cxlmd->endpoint is not being
referenced during region discovery. However follow on code that does
performance calculations will.
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-9-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Retrieve the qos_class (QTG ID) using the access coordinates from the
nearest CPU rather than the nearst initiator that may not be a CPU.
This may be the more appropriate number that applications care about.
For most cases, access0 and access1 have the same values.
Link: https://lore.kernel.org/linux-cxl/20240112113023.00006c50@Huawei.com/
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-8-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The difference between access class 0 and access class 1 for 'struct
access_coordinate', if any, is that class 0 is for the distance from
the target to the closest initiator and that class 1 is for the distance
from the target to the closest CPU. For CXL memory, the nearest initiator
may not necessarily be a CPU node. The performance path from the CXL
endpoint to the host bridge should remain the same. However, the numbers
extracted and stored from HMAT is the difference for the two access
classes. Split out the performance numbers for the host bridge (generic
target) from the calculation of the entire path in order to allow
calculation of both access classes for a CXL region.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Refactor the common code of combining coordinates in order to reduce code.
Create a new function cxl_cooordinates_combine() it combine two 'struct
access_coordinate'.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
classes
Update acpi_get_genport_coordinates() to allow retrieval of both access
classes of the 'struct access_coordinate' for a generic target. The update
will allow CXL code to compute access coordinates for both access class.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-5-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The expectation is that cxl_parse_cfwms() continues in the face the of
failure as evidenced by code like:
cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
if (IS_ERR(cxlrd))
return 0;
There are other error paths in that function which mistakenly follow
idiomatic expectations and return an error when they should not. Most of
those mistakes are innocuous checks that hardly ever fail in practice.
However, a recent change succeed in making the implementation more
fragile by applying an idiomatic, but still wrong "fix" [1]. In this
failure case the kernel reports:
cxl root0: Failed to populate active decoder targets
cxl_acpi ACPI0017:00: Failed to add decode range: [mem 0x00000000-0x7fffffff flags 0x200]
...which is a real issue with that one window (to be fixed separately),
but ends up failing the entirety of cxl_acpi_probe().
Undo that recent breakage while also removing the confusion about
ignoring errors. Update all exits paths to return an error per typical
expectations and let an outer wrapper function handle dropping the
error.
Fixes: 91019b5bc7c2 ("cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()") [1]
Cc: <stable@vger.kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|