diff options
345 files changed, 16648 insertions, 1806 deletions
diff --git a/Documentation/accel/amdxdna/amdnpu.rst b/Documentation/accel/amdxdna/amdnpu.rst new file mode 100644 index 000000000000..fbe0a7585345 --- /dev/null +++ b/Documentation/accel/amdxdna/amdnpu.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +.. include:: <isonum.txt> + +========= + AMD NPU +========= + +:Copyright: |copy| 2024 Advanced Micro Devices, Inc. +:Author: Sonal Santan <sonal.santan@amd.com> + +Overview +======== + +AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator +integrated into AMD client APU. NPU enables efficient execution of Machine +Learning applications like CNN, LLM, etc. NPU is based on +`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver. + + +Hardware Description +==================== + +AMD NPU consists of the following hardware components: + +AMD XDNA Array +-------------- + +AMD XDNA Array comprises of 2D array of compute and memory tiles built with +`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles and 1 +row of memory tile. Each compute tile contains a VLIW processor with its own +dedicated program and data memory. The memory tile acts as L2 memory. The 2D +array can be partitioned at a column boundary creating a spatially isolated +partition which can be bound to a workload context. + +Each column also has dedicated DMA engines to move data between host DDR and +memory tile. + +AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e., 4 rows of +compute tiles arranged into 5 columns. AMD Strix Point client APU have 4x8 +topology, i.e., 4 rows of compute tiles arranged into 8 columns. + +Shared L2 Memory +---------------- + +The single row of memory tiles create a pool of software managed on chip L2 +memory. DMA engines are used to move data between host DDR and memory tiles. +AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2 memory. +AMD Strix Point NPU has a total of 4096 KB of L2 memory. + +Microcontroller +--------------- + +A microcontroller runs NPU Firmware which is responsible for command processing, +XDNA Array partition setup, XDNA Array configuration, workload context +management and workload orchestration. + +NPU Firmware uses a dedicated instance of an isolated non-privileged context +called ERT to service each workload context. ERT is also used to execute user +provided ``ctrlcode`` associated with the workload context. + +NPU Firmware uses a single isolated privileged context called MERT to service +management commands from the amdxdna driver. + +Mailboxes +--------- + +The microcontroller and amdxdna driver use a privileged channel for management +tasks like setting up of contexts, telemetry, query, error handling, setting up +user channel, etc. As mentioned before, privileged channel requests are +serviced by MERT. The privileged channel is bound to a single mailbox. + +The microcontroller and amdxdna driver use a dedicated user channel per +workload context. The user channel is primarily used for submitting work to +the NPU. As mentioned before, a user channel requests are serviced by an +instance of ERT. Each user channel is bound to its own dedicated mailbox. + +PCIe EP +------- + +NPU is visible to the x86 host CPU as a PCIe device with multiple BARs and some +MSI-X interrupt vectors. NPU uses a dedicated high bandwidth SoC level fabric +for reading or writing into host memory. Each instance of ERT gets its own +dedicated MSI-X interrupt. MERT gets a single instance of MSI-X interrupt. + +The number of PCIe BARs varies depending on the specific device. Based on their +functions, PCIe BARs can generally be categorized into the following types. + +* PSP BAR: Expose the AMD PSP (Platform Security Processor) function +* SMU BAR: Expose the AMD SMU (System Management Unit) function +* SRAM BAR: Expose ring buffers for the mailbox +* Mailbox BAR: Expose the mailbox control registers (head, tail and ISR + registers etc.) +* Public Register BAR: Expose public registers + +On specific devices, the above-mentioned BAR type might be combined into a +single physical PCIe BAR. Or a module might require two physical PCIe BARs to +be fully functional. For example, + +* On AMD Phoenix device, PSP, SMU, Public Register BARs are on PCIe BAR index 0. +* On AMD Strix Point device, Mailbox and Public Register BARs are on PCIe BAR + index 0. The PSP has some registers in PCIe BAR index 0 (Public Register BAR) + and PCIe BAR index 4 (PSP BAR). + +Process Isolation Hardware +-------------------------- + +As explained before, XDNA Array can be dynamically divided into isolated +spatial partitions, each of which may have one or more columns. The spatial +partition is setup by programming the column isolation registers by the +microcontroller. Each spatial partition is associated with a PASID which is +also programmed by the microcontroller. Hence multiple spatial partitions in +the NPU can make concurrent host access protected by PASID. + +The NPU FW itself uses microcontroller MMU enforced isolated contexts for +servicing user and privileged channel requests. + + +Mixed Spatial and Temporal Scheduling +===================================== + +AMD XDNA architecture supports mixed spatial and temporal (time sharing) +scheduling of 2D array. This means that spatial partitions may be setup and +torn down dynamically to accommodate various workloads. A *spatial* partition +may be *exclusively* bound to one workload context while another partition may +be *temporarily* bound to more than one workload contexts. The microcontroller +updates the PASID for a temporarily shared partition to match the context that +has been bound to the partition at any moment. + +Resource Solver +--------------- + +The Resource Solver component of the amdxdna driver manages the allocation +of 2D array among various workloads. Every workload describes the number +of columns required to run the NPU binary in its metadata. The Resource Solver +component uses hints passed by the workload and its own heuristics to +decide 2D array (re)partition strategy and mapping of workloads for spatial and +temporal sharing of columns. The FW enforces the context-to-column(s) resource +binding decisions made by the Resource Solver. + +AMD Phoenix and AMD Hawk Point client NPU can support 6 concurrent workload +contexts. AMD Strix Point can support 16 concurrent workload contexts. + + +Application Binaries +==================== + +A NPU application workload is comprised of two separate binaries which are +generated by the NPU compiler. + +1. AMD XDNA Array overlay, which is used to configure a NPU spatial partition. + The overlay contains instructions for setting up the stream switch + configuration and ELF for the compute tiles. The overlay is loaded on the + spatial partition bound to the workload by the associated ERT instance. + Refer to the + `Versal Adaptive SoC AIE-ML Architecture Manual (AM020)`_ for more details. + +2. ``ctrlcode``, used for orchestrating the overlay loaded on the spatial + partition. ``ctrlcode`` is executed by the ERT running in protected mode on + the microcontroller in the context of the workload. ``ctrlcode`` is made up + of a sequence of opcodes named ``XAie_TxnOpcode``. Refer to the + `AI Engine Run Time`_ for more details. + + +Special Host Buffers +==================== + +Per-context Instruction Buffer +------------------------------ + +Every workload context uses a host resident 64 MB buffer which is memory +mapped into the ERT instance created to service the workload. The ``ctrlcode`` +used by the workload is copied into this special memory. This buffer is +protected by PASID like all other input/output buffers used by that workload. +Instruction buffer is also mapped into the user space of the workload. + +Global Privileged Buffer +------------------------ + +In addition, the driver also allocates a single buffer for maintenance tasks +like recording errors from MERT. This global buffer uses the global IOMMU +domain and is only accessible by MERT. + + +High-level Use Flow +=================== + +Here are the steps to run a workload on AMD NPU: + +1. Compile the workload into an overlay and a ``ctrlcode`` binary. +2. Userspace opens a context in the driver and provides the overlay. +3. The driver checks with the Resource Solver for provisioning a set of columns + for the workload. +4. The driver then asks MERT to create a context on the device with the desired + columns. +5. MERT then creates an instance of ERT. MERT also maps the Instruction Buffer + into ERT memory. +6. The userspace then copies the ``ctrlcode`` to the Instruction Buffer. +7. Userspace then creates a command buffer with pointers to input, output, and + instruction buffer; it then submits command buffer with the driver and goes + to sleep waiting for completion. +8. The driver sends the command over the Mailbox to ERT. +9. ERT *executes* the ``ctrlcode`` in the instruction buffer. +10. Execution of the ``ctrlcode`` kicks off DMAs to and from the host DDR while + AMD XDNA Array is running. +11. When ERT reaches end of ``ctrlcode``, it raises an MSI-X to send completion + signal to the driver which then wakes up the waiting workload. + + +Boot Flow +========= + +amdxdna driver uses PSP to securely load signed NPU FW and kick off the boot +of the NPU microcontroller. amdxdna driver then waits for the alive signal in +a special location on BAR 0. The NPU is switched off during SoC suspend and +turned on after resume where the NPU FW is reloaded, and the handshake is +performed again. + + +Userspace components +==================== + +Compiler +-------- + +Peano is an LLVM based open-source compiler for AMD XDNA Array compute tile +available at: +https://github.com/Xilinx/llvm-aie + +The open-source IREE compiler supports graph compilation of ML models for AMD +NPU and uses Peano underneath. It is available at: +https://github.com/nod-ai/iree-amd-aie + +Usermode Driver (UMD) +--------------------- + +The open-source XRT runtime stack interfaces with amdxdna kernel driver. XRT +can be found at: +https://github.com/Xilinx/XRT + +The open-source XRT shim for NPU is can be found at: +https://github.com/amd/xdna-driver + + +DMA Operation +============= + +DMA operation instructions are encoded in the ``ctrlcode`` as +``XAIE_IO_BLOCKWRITE`` opcode. When ERT executes ``XAIE_IO_BLOCKWRITE``, DMA +operations between host DDR and L2 memory are effected. + + +Error Handling +============== + +When MERT detects an error in AMD XDNA Array, it pauses execution for that +workload context and sends an asynchronous message to the driver over the +privileged channel. The driver then sends a buffer pointer to MERT to capture +the register states for the partition bound to faulting workload context. The +driver then decodes the error by reading the contents of the buffer pointer. + + +Telemetry +========= + +MERT can report various kinds of telemetry information like the following: + +* L1 interrupt counter +* DMA counter +* Deep Sleep counter +* etc. + + +References +========== + +- `AMD XDNA Architecture <https://www.amd.com/en/technologies/xdna.html>`_ +- `AMD AI Engine Technology <https://www.xilinx.com/products/technology/ai-engine.html>`_ +- `Peano <https://github.com/Xilinx/llvm-aie>`_ +- `Versal Adaptive SoC AIE-ML Architecture Manual (AM020) <https://docs.amd.com/r/en-US/am020-versal-aie-ml>`_ +- `AI Engine Run Time <https://github.com/Xilinx/aie-rt/tree/release/main_aig>`_ diff --git a/Documentation/accel/amdxdna/index.rst b/Documentation/accel/amdxdna/index.rst new file mode 100644 index 000000000000..38c16939f1fc --- /dev/null +++ b/Documentation/accel/amdxdna/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +===================================== + accel/amdxdna NPU driver +===================================== + +The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit). + +.. toctree:: + + amdnpu diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst index e94a0160b6a0..bc85f26533d8 100644 --- a/Documentation/accel/index.rst +++ b/Documentation/accel/index.rst @@ -8,6 +8,7 @@ Compute Accelerators :maxdepth: 1 introduction + amdxdna/index qaic/index .. only:: subproject and html diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml index 5b35adf34c7b..6d11f5955b51 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml @@ -14,6 +14,8 @@ properties: enum: - brcm,bcm2711-hdmi0 - brcm,bcm2711-hdmi1 + - brcm,bcm2712-hdmi0 + - brcm,bcm2712-hdmi1 reg: items: diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml index 2e8566f47e63..f91c9dce2a44 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml @@ -13,6 +13,7 @@ properties: compatible: enum: - brcm,bcm2711-hvs + - brcm,bcm2712-hvs - brcm,bcm2835-hvs reg: @@ -36,7 +37,9 @@ if: properties: compatible: contains: - const: brcm,bcm2711-hvs + enum: + - brcm,bcm2711-hvs + - brcm,bcm2712-hvs then: required: diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml index 4e1ba03f6477..6b5b1d3fbc0b 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml @@ -20,6 +20,9 @@ properties: - brcm,bcm2711-pixelvalve2 - brcm,bcm2711-pixelvalve3 - brcm,bcm2711-pixelvalve4 + - brcm,bcm2712-pixelvalve0 + - brcm,bcm2712-pixelvalve1 + - brcm,bcm2712-pixelvalve2 reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml index bb186197e471..16f45afd2bad 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml @@ -11,7 +11,10 @@ maintainers: properties: compatible: - const: brcm,bcm2835-txp + enum: + - brcm,bcm2712-mop + - brcm,bcm2712-moplet + - brcm,bcm2835-txp reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml index 49a5e041aa49..2aa9d5d2afff 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml @@ -18,6 +18,7 @@ properties: compatible: enum: - brcm,bcm2711-vc5 + - brcm,bcm2712-vc6 - brcm,bcm2835-vc4 - brcm,cygnus-vc4 diff --git a/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml b/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml index 032f783eefc4..684c2896d238 100644 --- a/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml +++ b/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml @@ -23,6 +23,8 @@ properties: - samsung,atna45af01 # Samsung 14.5" 3K (2944x1840 pixels) eDP AMOLED panel - samsung,atna45dc02 + # Samsung 15.6" 3K (2880x1620 pixels) eDP AMOLED panel + - samsung,atna56ac03 - const: samsung,atna33xc20 enable-gpios: true diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst index 8cf2f041af47..b4ee25af1702 100644 --- a/Documentation/gpu/drm-kms-helpers.rst +++ b/Documentation/gpu/drm-kms-helpers.rst @@ -221,6 +221,9 @@ Panel Helper Reference .. kernel-doc:: drivers/gpu/drm/drm_panel_orientation_quirks.c :export: +.. kernel-doc:: drivers/gpu/drm/drm_panel_backlight_quirks.c + :export: + Panel Self Refresh Helper Reference =================================== diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst index 3f07aa3b5432..92cfb25e64d3 100644 --- a/Documentation/gpu/xe/index.rst +++ b/Documentation/gpu/xe/index.rst @@ -23,4 +23,5 @@ DG2, etc is provided to prototype the driver. xe_firmware xe_tile xe_debugging + xe_devcoredump xe-drm-usage-stats.rst diff --git a/Documentation/gpu/xe/xe_devcoredump.rst b/Documentation/gpu/xe/xe_devcoredump.rst new file mode 100644 index 000000000000..ae4ec0e34dc0 --- /dev/null +++ b/Documentation/gpu/xe/xe_devcoredump.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT) + +================== +Xe Device Coredump +================== + +.. kernel-doc:: drivers/gpu/drm/xe/xe_devcoredump.c + :doc: Xe device coredump + +Internal API +============ + +.. kernel-doc:: drivers/gpu/drm/xe/xe_devcoredump.c + :internal: diff --git a/MAINTAINERS b/MAINTAINERS index baf0eeb9a355..63118f36e76e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1194,6 +1194,17 @@ L: linux-spi@vger.kernel.org S: Supported F: drivers/spi/spi-amd.c +AMD XDNA DRIVER +M: Min Ma <min.ma@amd.com> +M: Lizhi Hou <lizhi.hou@amd.com> +L: dri-devel@lists.freedesktop.org +S: Supported +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git +F: Documentation/accel/amdxdna/ +F: drivers/accel/amdxdna/ +F: include/trace/events/amdxdna.h +F: include/uapi/drm/amdxdna_accel.h + AMD XGBE DRIVER M: "Shyam Sundar S K" <Shyam-sundar.S-k@amd.com> L: netdev@vger.kernel.org @@ -7385,7 +7396,7 @@ L: virtualization@lists.linux.dev S: Obsolete W: https://www.kraxel.org/blog/2014/10/qemu-using-cirrus-considered-harmful/ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git -F: drivers/gpu/drm/tiny/cirrus.c +F: drivers/gpu/drm/tiny/cirrus-qemu.c DRM DRIVER FOR QXL VIRTUAL GPU M: Dave Airlie <airlied@redhat.com> @@ -7796,6 +7807,7 @@ F: drivers/gpu/drm/rockchip/ DRM DRIVERS FOR STI M: Alain Volmat <alain.volmat@foss.st.com> +M: Raphael Gallais-Pou <rgallaispou@gmail.com> L: dri-devel@lists.freedesktop.org S: Maintained T: git https://gitlab.freedesktop.org/drm/misc/kernel.git diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig index 64065fb8922b..5b9490367a39 100644 --- a/drivers/accel/Kconfig +++ b/drivers/accel/Kconfig @@ -24,6 +24,7 @@ menuconfig DRM_ACCEL different device files, called accel/accel* (in /dev, sysfs and debugfs). +source "drivers/accel/amdxdna/Kconfig" source "drivers/accel/habanalabs/Kconfig" source "drivers/accel/ivpu/Kconfig" source "drivers/accel/qaic/Kconfig" diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile index ab3df932937f..a301fb6089d4 100644 --- a/drivers/accel/Makefile +++ b/drivers/accel/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/ obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/ obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/ obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/ diff --git a/drivers/accel/amdxdna/Kconfig b/drivers/accel/amdxdna/Kconfig new file mode 100644 index 000000000000..f39d7a87296c --- /dev/null +++ b/drivers/accel/amdxdna/Kconfig @@ -0,0 +1,18 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config DRM_ACCEL_AMDXDNA + tristate "AMD AI Engine" + depends on AMD_IOMMU + depends on DRM_ACCEL + depends on PCI && HAS_IOMEM + depends on X86_64 + select DRM_SCHED + select DRM_GEM_SHMEM_HELPER + select FW_LOADER + select HMM_MIRROR + help + Choose this option to enable support for NPU integrated into AMD + client CPUs like AMD Ryzen AI 300 Series. AMD NPU can be used to + accelerate machine learning applications. + + If "M" is selected, the driver module will be amdxdna. diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile new file mode 100644 index 000000000000..ed6f87910880 --- /dev/null +++ b/drivers/accel/amdxdna/Makefile @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-only + +amdxdna-y := \ + aie2_ctx.o \ + aie2_error.o \ + aie2_message.o \ + aie2_pci.o \ + aie2_psp.o \ + aie2_smu.o \ + aie2_solver.o \ + amdxdna_ctx.o \ + amdxdna_gem.o \ + amdxdna_mailbox.o \ + amdxdna_mailbox_helper.o \ + amdxdna_pci_drv.o \ + amdxdna_sysfs.o \ + npu1_regs.o \ + npu2_regs.o \ + npu4_regs.o \ + npu5_regs.o +obj-$(CONFIG_DRM_ACCEL_AMDXDNA) = amdxdna.o diff --git a/drivers/accel/amdxdna/TODO b/drivers/accel/amdxdna/TODO new file mode 100644 index 000000000000..a130259f5f70 --- /dev/null +++ b/drivers/accel/amdxdna/TODO @@ -0,0 +1,5 @@ +- Replace idr with xa +- Add import and export BO support +- Add debugfs support +- Add debug BO support +- Improve power management diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c new file mode 100644 index 000000000000..90e8d87666a9 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -0,0 +1,900 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_print.h> +#include <drm/drm_syncobj.h> +#include <linux/hmm.h> +#include <linux/types.h> +#include <trace/events/amdxdna.h> + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "aie2_solver.h" +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +bool force_cmdlist; +module_param(force_cmdlist, bool, 0600); +MODULE_PARM_DESC(force_cmdlist, "Force use command list (Default false)"); + +#define HWCTX_MAX_TIMEOUT 60000 /* milliseconds */ + +static void aie2_job_release(struct kref *ref) +{ + struct amdxdna_sched_job *job; + + job = container_of(ref, struct amdxdna_sched_job, refcnt); + amdxdna_sched_job_cleanup(job); + if (job->out_fence) + dma_fence_put(job->out_fence); + kfree(job); +} + +static void aie2_job_put(struct amdxdna_sched_job *job) +{ + kref_put(&job->refcnt, aie2_job_release); +} + +/* The bad_job is used in aie2_sched_job_timedout, otherwise, set it to NULL */ +static void aie2_hwctx_stop(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwctx, + struct drm_sched_job *bad_job) +{ + drm_sched_stop(&hwctx->priv->sched, bad_job); + aie2_destroy_context(xdna->dev_handle, hwctx); +} + +static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_gem_obj *heap = hwctx->priv->heap; + int ret; + + ret = aie2_create_context(xdna->dev_handle, hwctx); + if (ret) { + XDNA_ERR(xdna, "Create hwctx failed, ret %d", ret); + goto out; + } + + ret = aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buf failed, ret %d", ret); + goto out; + } + + if (hwctx->status != HWCTX_STAT_READY) { + XDNA_DBG(xdna, "hwctx is not ready, status %d", hwctx->status); + goto out; + } + + ret = aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Config cu failed, ret %d", ret); + goto out; + } + +out: + drm_sched_start(&hwctx->priv->sched, 0); + XDNA_DBG(xdna, "%s restarted, ret %d", hwctx->name, ret); + return ret; +} + +void aie2_restart_ctx(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_hwctx *hwctx; + int next = 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + if (hwctx->status != HWCTX_STAT_STOP) + continue; + + hwctx->status = hwctx->old_status; + XDNA_DBG(xdna, "Resetting %s", hwctx->name); + aie2_hwctx_restart(xdna, hwctx); + } + mutex_unlock(&client->hwctx_lock); +} + +static struct dma_fence *aie2_cmd_get_out_fence(struct amdxdna_hwctx *hwctx, u64 seq) +{ + struct dma_fence *fence, *out_fence = NULL; + int ret; + + fence = drm_syncobj_fence_get(hwctx->priv->syncobj); + if (!fence) + return NULL; + + ret = dma_fence_chain_find_seqno(&fence, seq); + if (ret) + goto out; + + out_fence = dma_fence_get(dma_fence_chain_contained(fence)); + +out: + dma_fence_put(fence); + return out_fence; +} + +static void aie2_hwctx_wait_for_idle(struct amdxdna_hwctx *hwctx) +{ + struct dma_fence *fence; + + fence = aie2_cmd_get_out_fence(hwctx, hwctx->priv->seq - 1); + if (!fence) + return; + + dma_fence_wait(fence, false); + dma_fence_put(fence); +} + +void aie2_hwctx_suspend(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + /* + * Command timeout is unlikely. But if it happens, it doesn't + * break the system. aie2_hwctx_stop() will destroy mailbox + * and abort all commands. + */ + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + aie2_hwctx_wait_for_idle(hwctx); + aie2_hwctx_stop(xdna, hwctx, NULL); + hwctx->old_status = hwctx->status; + hwctx->status = HWCTX_STAT_STOP; +} + +void aie2_hwctx_resume(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + /* + * The resume path cannot guarantee that mailbox channel can be + * regenerated. If this happen, when submit message to this + * mailbox channel, error will return. + */ + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + hwctx->status = hwctx->old_status; + aie2_hwctx_restart(xdna, hwctx); +} + +static void +aie2_sched_notify(struct amdxdna_sched_job *job) +{ + struct dma_fence *fence = job->fence; + + trace_xdna_job(&job->base, job->hwctx->name, "signaled fence", job->seq); + job->hwctx->priv->completed++; + dma_fence_signal(fence); + + up(&job->hwctx->priv->job_sem); + job->job_done = true; + dma_fence_put(fence); + mmput(job->mm); + aie2_job_put(job); +} + +static int +aie2_sched_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job = handle; + struct amdxdna_gem_obj *cmd_abo; + u32 ret = 0; + u32 status; + + cmd_abo = job->cmd_bo; + + if (unlikely(!data)) + goto out; + + if (unlikely(size != sizeof(u32))) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret = -EINVAL; + goto out; + } + + status = *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + if (status == AIE2_STATUS_SUCCESS) + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + else + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ERROR); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_nocmd_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job = handle; + u32 ret = 0; + u32 status; + + if (unlikely(!data)) + goto out; + + if (unlikely(size != sizeof(u32))) { + ret = -EINVAL; + goto out; + } + + status = *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_cmdlist_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job = handle; + struct amdxdna_gem_obj *cmd_abo; + struct cmd_chain_resp *resp; + struct amdxdna_dev *xdna; + u32 fail_cmd_status; + u32 fail_cmd_idx; + u32 ret = 0; + + cmd_abo = job->cmd_bo; + if (unlikely(!data) || unlikely(size != sizeof(u32) * 3)) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret = -EINVAL; + goto out; + } + + resp = (struct cmd_chain_resp *)data; + xdna = job->hwctx->client->xdna; + XDNA_DBG(xdna, "Status 0x%x", resp->status); + if (resp->status == AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + goto out; + } + + /* Slow path to handle error, read from ringbuf on BAR */ + fail_cmd_idx = resp->fail_cmd_idx; + fail_cmd_status = resp->fail_cmd_status; + XDNA_DBG(xdna, "Failed cmd idx %d, status 0x%x", + fail_cmd_idx, fail_cmd_status); + + if (fail_cmd_status == AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret = -EINVAL; + goto out; + } + amdxdna_cmd_set_state(cmd_abo, fail_cmd_status); + + if (amdxdna_cmd_get_op(cmd_abo) == ERT_CMD_CHAIN) { + struct amdxdna_cmd_chain *cc = amdxdna_cmd_get_payload(cmd_abo, NULL); + + cc->error_index = fail_cmd_idx; + if (cc->error_index >= cc->command_count) + cc->error_index = 0; + } +out: + aie2_sched_notify(job); + return ret; +} + +static struct dma_fence * +aie2_sched_job_run(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job); + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + struct amdxdna_hwctx *hwctx = job->hwctx; + struct dma_fence *fence; + int ret; + + if (!mmget_not_zero(job->mm)) + return ERR_PTR(-ESRCH); + + kref_get(&job->refcnt); + fence = dma_fence_get(job->fence); + + if (unlikely(!cmd_abo)) { + ret = aie2_sync_bo(hwctx, job, aie2_sched_nocmd_resp_handler); + goto out; + } + + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_NEW); + + if (amdxdna_cmd_get_op(cmd_abo) == ERT_CMD_CHAIN) + ret = aie2_cmdlist_multi_execbuf(hwctx, job, aie2_sched_cmdlist_resp_handler); + else if (force_cmdlist) + ret = aie2_cmdlist_single_execbuf(hwctx, job, aie2_sched_cmdlist_resp_handler); + else + ret = aie2_execbuf(hwctx, job, aie2_sched_resp_handler); + +out: + if (ret) { + dma_fence_put(job->fence); + aie2_job_put(job); + mmput(job->mm); + fence = ERR_PTR(ret); + } + trace_xdna_job(sched_job, hwctx->name, "sent to device", job->seq); + + return fence; +} + +static void aie2_sched_job_free(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx = job->hwctx; + + trace_xdna_job(sched_job, hwctx->name, "job free", job->seq); + if (!job->job_done) + up(&hwctx->priv->job_sem); + + drm_sched_job_cleanup(sched_job); + aie2_job_put(job); +} + +static enum drm_gpu_sched_stat +aie2_sched_job_timedout(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx = job->hwctx; + struct amdxdna_dev *xdna; + + xdna = hwctx->client->xdna; + trace_xdna_job(sched_job, hwctx->name, "job timedout", job->seq); + mutex_lock(&xdna->dev_lock); + aie2_hwctx_stop(xdna, hwctx, sched_job); + + aie2_hwctx_restart(xdna, hwctx); + mutex_unlock(&xdna->dev_lock); + + return DRM_GPU_SCHED_STAT_NOMINAL; +} + +const struct drm_sched_backend_ops sched_ops = { + .run_job = aie2_sched_job_run, + .free_job = aie2_sched_job_free, + .timedout_job = aie2_sched_job_timedout, +}; + +static int aie2_hwctx_col_list(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct amdxdna_dev_hdl *ndev; + int start, end, first, last; + u32 width = 1, entries = 0; + int i; + + if (!hwctx->num_tiles) { + XDNA_ERR(xdna, "Number of tiles is zero"); + return -EINVAL; + } + + ndev = xdna->dev_handle; + if (unlikely(!ndev->metadata.core.row_count)) { + XDNA_WARN(xdna, "Core tile row count is zero"); + return -EINVAL; + } + + hwctx->num_col = hwctx->num_tiles / ndev->metadata.core.row_count; + if (!hwctx->num_col || hwctx->num_col > ndev->total_col) { + XDNA_ERR(xdna, "Invalid num_col %d", hwctx->num_col); + return -EINVAL; + } + + if (ndev->priv->col_align == COL_ALIGN_NATURE) + width = hwctx->num_col; + + /* + * In range [start, end], find out columns that is multiple of width. + * 'first' is the first column, + * 'last' is the last column, + * 'entries' is the total number of columns. + */ + start = xdna->dev_info->first_col; + end = ndev->total_col - hwctx->num_col; + if (start > 0 && end == 0) { + XDNA_DBG(xdna, "Force start from col 0"); + start = 0; + } + first = start + (width - start % width) % width; + last = end - end % width; + if (last >= first) + entries = (last - first) / width + 1; + XDNA_DBG(xdna, "start %d end %d first %d last %d", + start, end, first, last); + + if (unlikely(!entries)) { + XDNA_ERR(xdna, "Start %d end %d width %d", + start, end, width); + return -EINVAL; + } + + hwctx->col_list = kmalloc_array(entries, sizeof(*hwctx->col_list), GFP_KERNEL); + if (!hwctx->col_list) + return -ENOMEM; + + hwctx->col_list_len = entries; + hwctx->col_list[0] = first; + for (i = 1; i < entries; i++) + hwctx->col_list[i] = hwctx->col_list[i - 1] + width; + + print_hex_dump_debug("col_list: ", DUMP_PREFIX_OFFSET, 16, 4, hwctx->col_list, + entries * sizeof(*hwctx->col_list), false); + return 0; +} + +static int aie2_alloc_resource(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct alloc_requests *xrs_req; + int ret; + + xrs_req = kzalloc(sizeof(*xrs_req), GFP_KERNEL); + if (!xrs_req) + return -ENOMEM; + + xrs_req->cdo.start_cols = hwctx->col_list; + xrs_req->cdo.cols_len = hwctx->col_list_len; + xrs_req->cdo.ncols = hwctx->num_col; + xrs_req->cdo.qos_cap.opc = hwctx->max_opc; + + xrs_req->rqos.gops = hwctx->qos.gops; + xrs_req->rqos.fps = hwctx->qos.fps; + xrs_req->rqos.dma_bw = hwctx->qos.dma_bandwidth; + xrs_req->rqos.latency = hwctx->qos.latency; + xrs_req->rqos.exec_time = hwctx->qos.frame_exec_time; + xrs_req->rqos.priority = hwctx->qos.priority; + + xrs_req->rid = (uintptr_t)hwctx; + + ret = xrs_allocate_resource(xdna->xrs_hdl, xrs_req, hwctx); + if (ret) + XDNA_ERR(xdna, "Allocate AIE resource failed, ret %d", ret); + + kfree(xrs_req); + return ret; +} + +static void aie2_release_resource(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + int ret; + + ret = xrs_release_resource(xdna->xrs_hdl, (uintptr_t)hwctx); + if (ret) + XDNA_ERR(xdna, "Release AIE resource failed, ret %d", ret); +} + +static int aie2_ctx_syncobj_create(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct drm_file *filp = hwctx->client->filp; + struct drm_syncobj *syncobj; + u32 hdl; + int ret; + + hwctx->syncobj_hdl = AMDXDNA_INVALID_FENCE_HANDLE; + + ret = drm_syncobj_create(&syncobj, 0, NULL); + if (ret) { + XDNA_ERR(xdna, "Create ctx syncobj failed, ret %d", ret); + return ret; + } + ret = drm_syncobj_get_handle(filp, syncobj, &hdl); + if (ret) { + drm_syncobj_put(syncobj); + XDNA_ERR(xdna, "Create ctx syncobj handle failed, ret %d", ret); + return ret; + } + hwctx->priv->syncobj = syncobj; + hwctx->syncobj_hdl = hdl; + + return 0; +} + +static void aie2_ctx_syncobj_destroy(struct amdxdna_hwctx *hwctx) +{ + /* + * The syncobj_hdl is owned by user space and will be cleaned up + * separately. + */ + drm_syncobj_put(hwctx->priv->syncobj); +} + +int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_client *client = hwctx->client; + struct amdxdna_dev *xdna = client->xdna; + struct drm_gpu_scheduler *sched; + struct amdxdna_hwctx_priv *priv; + struct amdxdna_gem_obj *heap; + int i, ret; + + priv = kzalloc(sizeof(*hwctx->priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + hwctx->priv = priv; + + mutex_lock(&client->mm_lock); + heap = client->dev_heap; + if (!heap) { + XDNA_ERR(xdna, "The client dev heap object not exist"); + mutex_unlock(&client->mm_lock); + ret = -ENOENT; + goto free_priv; + } + drm_gem_object_get(to_gobj(heap)); + mutex_unlock(&client->mm_lock); + priv->heap = heap; + sema_init(&priv->job_sem, HWCTX_MAX_CMDS); + + ret = amdxdna_gem_pin(heap); + if (ret) { + XDNA_ERR(xdna, "Dev heap pin failed, ret %d", ret); + goto put_heap; + } + + for (i = 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + struct amdxdna_gem_obj *abo; + struct amdxdna_drm_create_bo args = { + .flags = 0, + .type = AMDXDNA_BO_DEV, + .vaddr = 0, + .size = MAX_CHAIN_CMDBUF_SIZE, + }; + + abo = amdxdna_drm_alloc_dev_bo(&xdna->ddev, &args, client->filp, true); + if (IS_ERR(abo)) { + ret = PTR_ERR(abo); + goto free_cmd_bufs; + } + + XDNA_DBG(xdna, "Command buf %d addr 0x%llx size 0x%lx", + i, abo->mem.dev_addr, abo->mem.size); + priv->cmd_buf[i] = abo; + } + + sched = &priv->sched; + mutex_init(&priv->io_lock); + + fs_reclaim_acquire(GFP_KERNEL); + might_lock(&priv->io_lock); + fs_reclaim_release(GFP_KERNEL); + + ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT, + HWCTX_MAX_CMDS, 0, msecs_to_jiffies(HWCTX_MAX_TIMEOUT), + NULL, NULL, hwctx->name, xdna->ddev.dev); + if (ret) { + XDNA_ERR(xdna, "Failed to init DRM scheduler. ret %d", ret); + goto free_cmd_bufs; + } + + ret = drm_sched_entity_init(&priv->entity, DRM_SCHED_PRIORITY_NORMAL, + &sched, 1, NULL); + if (ret) { + XDNA_ERR(xdna, "Failed to initial sched entiry. ret %d", ret); + goto free_sched; + } + + ret = aie2_hwctx_col_list(hwctx); + if (ret) { + XDNA_ERR(xdna, "Create col list failed, ret %d", ret); + goto free_entity; + } + + ret = aie2_alloc_resource(hwctx); + if (ret) { + XDNA_ERR(xdna, "Alloc hw resource failed, ret %d", ret); + goto free_col_list; + } + + ret = aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buffer failed, ret %d", ret); + goto release_resource; + } + + ret = aie2_ctx_syncobj_create(hwctx); + if (ret) { + XDNA_ERR(xdna, "Create syncobj failed, ret %d", ret); + goto release_resource; + } + + hwctx->status = HWCTX_STAT_INIT; + + XDNA_DBG(xdna, "hwctx %s init completed", hwctx->name); + + return 0; + +release_resource: + aie2_release_resource(hwctx); +free_col_list: + kfree(hwctx->col_list); +free_entity: + drm_sched_entity_destroy(&priv->entity); +free_sched: + drm_sched_fini(&priv->sched); +free_cmd_bufs: + for (i = 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + if (!priv->cmd_buf[i]) + continue; + drm_gem_object_put(to_gobj(priv->cmd_buf[i])); + } + amdxdna_gem_unpin(heap); +put_heap: + drm_gem_object_put(to_gobj(heap)); +free_priv: + kfree(priv); + return ret; +} + +void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna; + int idx; + + xdna = hwctx->client->xdna; + drm_sched_wqueue_stop(&hwctx->priv->sched); + + /* Now, scheduler will not send command to device. */ + aie2_release_resource(hwctx); + + /* + * All submitted commands are aborted. + * Restart scheduler queues to cleanup jobs. The amdxdna_sched_job_run() + * will return NODEV if it is called. + */ + drm_sched_wqueue_start(&hwctx->priv->sched); + + aie2_hwctx_wait_for_idle(hwctx); + drm_sched_entity_destroy(&hwctx->priv->entity); + drm_sched_fini(&hwctx->priv->sched); + aie2_ctx_syncobj_destroy(hwctx); + + XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq); + + for (idx = 0; idx < ARRAY_SIZE(hwctx->priv->cmd_buf); idx++) + drm_gem_object_put(to_gobj(hwctx->priv->cmd_buf[idx])); + amdxdna_gem_unpin(hwctx->priv->heap); + drm_gem_object_put(to_gobj(hwctx->priv->heap)); + + mutex_destroy(&hwctx->priv->io_lock); + kfree(hwctx->col_list); + kfree(hwctx->priv); + kfree(hwctx->cus); +} + +static int aie2_hwctx_cu_config(struct amdxdna_hwctx *hwctx, void *buf, u32 size) +{ + struct amdxdna_hwctx_param_config_cu *config = buf; + struct amdxdna_dev *xdna = hwctx->client->xdna; + u32 total_size; + int ret; + + XDNA_DBG(xdna, "Config %d CU to %s", config->num_cus, hwctx->name); + if (hwctx->status != HWCTX_STAT_INIT) { + XDNA_ERR(xdna, "Not support re-config CU"); + return -EINVAL; + } + + if (!config->num_cus) { + XDNA_ERR(xdna, "Number of CU is zero"); + return -EINVAL; + } + + total_size = struct_size(config, cu_configs, config->num_cus); + if (total_size > size) { + XDNA_ERR(xdna, "CU config larger than size"); + return -EINVAL; + } + + hwctx->cus = kmemdup(config, total_size, GFP_KERNEL); + if (!hwctx->cus) + return -ENOMEM; + + ret = aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Config CU to firmware failed, ret %d", ret); + goto free_cus; + } + + wmb(); /* To avoid locking in command submit when check status */ + hwctx->status = HWCTX_STAT_READY; + + return 0; + +free_cus: + kfree(hwctx->cus); + hwctx->cus = NULL; + return ret; +} + +int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, void *buf, u32 size) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + switch (type) { + case DRM_AMDXDNA_HWCTX_CONFIG_CU: + return aie2_hwctx_cu_config(hwctx, buf, size); + case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: + case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: + return -EOPNOTSUPP; + default: + XDNA_DBG(xdna, "Not supported type %d", type); + return -EOPNOTSUPP; + } +} + +static int aie2_populate_range(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + struct mm_struct *mm = abo->mem.notifier.mm; + struct hmm_range range = { 0 }; + unsigned long timeout; + int ret; + + XDNA_INFO_ONCE(xdna, "populate memory range %llx size %lx", + abo->mem.userptr, abo->mem.size); + range.notifier = &abo->mem.notifier; + range.start = abo->mem.userptr; + range.end = abo->mem.userptr + abo->mem.size; + range.hmm_pfns = abo->mem.pfns; + range.default_flags = HMM_PFN_REQ_FAULT; + + if (!mmget_not_zero(mm)) + return -EFAULT; + + timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); +again: + range.notifier_seq = mmu_interval_read_begin(&abo->mem.notifier); + mmap_read_lock(mm); + ret = hmm_range_fault(&range); + mmap_read_unlock(mm); + if (ret) { + if (time_after(jiffies, timeout)) { + ret = -ETIME; + goto put_mm; + } + + if (ret == -EBUSY) + goto again; + + goto put_mm; + } + + down_read(&xdna->notifier_lock); + if (mmu_interval_read_retry(&abo->mem.notifier, range.notifier_seq)) { + up_read(&xdna->notifier_lock); + goto again; + } + abo->mem.map_invalid = false; + up_read(&xdna->notifier_lock); + +put_mm: + mmput(mm); + return ret; +} + +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, u64 *seq) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct ww_acquire_ctx acquire_ctx; + struct dma_fence_chain *chain; + struct amdxdna_gem_obj *abo; + unsigned long timeout = 0; + int ret, i; + + ret = down_interruptible(&hwctx->priv->job_sem); + if (ret) { + XDNA_ERR(xdna, "Grab job sem failed, ret %d", ret); + return ret; + } + + chain = dma_fence_chain_alloc(); + if (!chain) { + XDNA_ERR(xdna, "Alloc fence chain failed"); + ret = -ENOMEM; + goto up_sem; + } + + ret = drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, hwctx); + if (ret) { + XDNA_ERR(xdna, "DRM job init failed, ret %d", ret); + goto free_chain; + } + +retry: + ret = drm_gem_lock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (ret) { + XDNA_WARN(xdna, "Failed to lock BOs, ret %d", ret); + goto cleanup_job; + } + + for (i = 0; i < job->bo_cnt; i++) { + ret = dma_resv_reserve_fences(job->bos[i]->resv, 1); + if (ret) { + XDNA_WARN(xdna, "Failed to reserve fences %d", ret); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + goto cleanup_job; + } + } + + down_read(&xdna->notifier_lock); + for (i = 0; i < job->bo_cnt; i++) { + abo = to_xdna_obj(job->bos[i]); + if (abo->mem.map_invalid) { + up_read(&xdna->notifier_lock); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (!timeout) { + timeout = jiffies + + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); + } else if (time_after(jiffies, timeout)) { + ret = -ETIME; + goto cleanup_job; + } + + ret = aie2_populate_range(abo); + if (ret) + goto cleanup_job; + goto retry; + } + } + + mutex_lock(&hwctx->priv->io_lock); + drm_sched_job_arm(&job->base); + job->out_fence = dma_fence_get(&job->base.s_fence->finished); + for (i = 0; i < job->bo_cnt; i++) + dma_resv_add_fence(job->bos[i]->resv, job->out_fence, DMA_RESV_USAGE_WRITE); + job->seq = hwctx->priv->seq++; + kref_get(&job->refcnt); + drm_sched_entity_push_job(&job->base); + + *seq = job->seq; + drm_syncobj_add_point(hwctx->priv->syncobj, chain, job->out_fence, *seq); + mutex_unlock(&hwctx->priv->io_lock); + + up_read(&xdna->notifier_lock); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + + aie2_job_put(job); + + return 0; + +cleanup_job: + drm_sched_job_cleanup(&job->base); +free_chain: + dma_fence_chain_free(chain); +up_sem: + up(&hwctx->priv->job_sem); + job->job_done = true; + return ret; +} + +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, + unsigned long cur_seq) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + struct drm_gem_object *gobj = to_gobj(abo); + long ret; + + down_write(&xdna->notifier_lock); + abo->mem.map_invalid = true; + mmu_interval_set_seq(&abo->mem.notifier, cur_seq); + up_write(&xdna->notifier_lock); + ret = dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, + true, MAX_SCHEDULE_TIMEOUT); + if (!ret || ret == -ERESTARTSYS) + XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); +} diff --git a/drivers/accel/amdxdna/aie2_error.c b/drivers/accel/amdxdna/aie2_error.c new file mode 100644 index 000000000000..b1defaa8513b --- /dev/null +++ b/drivers/accel/amdxdna/aie2_error.c @@ -0,0 +1,360 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/drm_cache.h> +#include <drm/drm_device.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <linux/dma-mapping.h> +#include <linux/kthread.h> +#include <linux/kernel.h> + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +struct async_event { + struct amdxdna_dev_hdl *ndev; + struct async_event_msg_resp resp; + struct workqueue_struct *wq; + struct work_struct work; + u8 *buf; + dma_addr_t addr; + u32 size; +}; + +struct async_events { + struct workqueue_struct *wq; + u8 *buf; + dma_addr_t addr; + u32 size; + u32 event_cnt; + struct async_event event[] __counted_by(event_cnt); +}; + +/* + * Below enum, struct and lookup tables are porting from XAIE util header file. + * + * Below data is defined by AIE device and it is used for decode error message + * from the device. + */ + +enum aie_module_type { + AIE_MEM_MOD = 0, + AIE_CORE_MOD, + AIE_PL_MOD, +}; + +enum aie_error_category { + AIE_ERROR_SATURATION = 0, + AIE_ERROR_FP, + AIE_ERROR_STREAM, + AIE_ERROR_ACCESS, + AIE_ERROR_BUS, + AIE_ERROR_INSTRUCTION, + AIE_ERROR_ECC, + AIE_ERROR_LOCK, + AIE_ERROR_DMA, + AIE_ERROR_MEM_PARITY, + /* Unknown is not from XAIE, added for better category */ + AIE_ERROR_UNKNOWN, +}; + +/* Don't pack, unless XAIE side changed */ +struct aie_error { + __u8 row; + __u8 col; + __u32 mod_type; + __u8 event_id; +}; + +struct aie_err_info { + u32 err_cnt; + u32 ret_code; + u32 rsvd; + struct aie_error payload[] __counted_by(err_cnt); +}; + +struct aie_event_category { + u8 event_id; + enum aie_error_category category; +}; + +#define EVENT_CATEGORY(id, cat) { id, cat } +static const struct aie_event_category aie_ml_mem_event_cat[] = { + EVENT_CATEGORY(88U, AIE_ERROR_ECC), + EVENT_CATEGORY(90U, AIE_ERROR_ECC), + EVENT_CATEGORY(91U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(92U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(93U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(94U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(95U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(96U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(97U, AIE_ERROR_DMA), + EVENT_CATEGORY(98U, AIE_ERROR_DMA), + EVENT_CATEGORY(99U, AIE_ERROR_DMA), + EVENT_CATEGORY(100U, AIE_ERROR_DMA), + EVENT_CATEGORY(101U, AIE_ERROR_LOCK), +}; + +static const struct aie_event_category aie_ml_core_event_cat[] = { + EVENT_CATEGORY(55U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(56U, AIE_ERROR_STREAM), + EVENT_CATEGORY(57U, AIE_ERROR_STREAM), + EVENT_CATEGORY(58U, AIE_ERROR_BUS), + EVENT_CATEGORY(59U, AIE_ERROR_INSTRUCTION), + EVENT_CATEGORY(60U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(62U, AIE_ERROR_ECC), + EVENT_CATEGORY(64U, AIE_ERROR_ECC), + EVENT_CATEGORY(65U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(66U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(67U, AIE_ERROR_LOCK), + EVENT_CATEGORY(70U, AIE_ERROR_INSTRUCTION), + EVENT_CATEGORY(71U, AIE_ERROR_STREAM), + EVENT_CATEGORY(72U, AIE_ERROR_BUS), +}; + +static const struct aie_event_category aie_ml_mem_tile_event_cat[] = { + EVENT_CATEGORY(130U, AIE_ERROR_ECC), + EVENT_CATEGORY(132U, AIE_ERROR_ECC), + EVENT_CATEGORY(133U, AIE_ERROR_DMA), + EVENT_CATEGORY(134U, AIE_ERROR_DMA), + EVENT_CATEGORY(135U, AIE_ERROR_STREAM), + EVENT_CATEGORY(136U, AIE_ERROR_STREAM), + EVENT_CATEGORY(137U, AIE_ERROR_STREAM), + EVENT_CATEGORY(138U, AIE_ERROR_BUS), + EVENT_CATEGORY(139U, AIE_ERROR_LOCK), +}; + +static const struct aie_event_category aie_ml_shim_tile_event_cat[] = { + EVENT_CATEGORY(64U, AIE_ERROR_BUS), + EVENT_CATEGORY(65U, AIE_ERROR_STREAM), + EVENT_CATEGORY(66U, AIE_ERROR_STREAM), + EVENT_CATEGORY(67U, AIE_ERROR_BUS), + EVENT_CATEGORY(68U, AIE_ERROR_BUS), + EVENT_CATEGORY(69U, AIE_ERROR_BUS), + EVENT_CATEGORY(70U, AIE_ERROR_BUS), + EVENT_CATEGORY(71U, AIE_ERROR_BUS), + EVENT_CATEGORY(72U, AIE_ERROR_DMA), + EVENT_CATEGORY(73U, AIE_ERROR_DMA), + EVENT_CATEGORY(74U, AIE_ERROR_LOCK), +}; + +static enum aie_error_category +aie_get_error_category(u8 row, u8 event_id, enum aie_module_type mod_type) +{ + const struct aie_event_category *lut; + int num_entry; + int i; + + switch (mod_type) { + case AIE_PL_MOD: + lut = aie_ml_shim_tile_event_cat; + num_entry = ARRAY_SIZE(aie_ml_shim_tile_event_cat); + break; + case AIE_CORE_MOD: + lut = aie_ml_core_event_cat; + num_entry = ARRAY_SIZE(aie_ml_core_event_cat); + break; + case AIE_MEM_MOD: + if (row == 1) { + lut = aie_ml_mem_tile_event_cat; + num_entry = ARRAY_SIZE(aie_ml_mem_tile_event_cat); + } else { + lut = aie_ml_mem_event_cat; + num_entry = ARRAY_SIZE(aie_ml_mem_event_cat); + } + break; + default: + return AIE_ERROR_UNKNOWN; + } + + for (i = 0; i < num_entry; i++) { + if (event_id != lut[i].event_id) + continue; + + return lut[i].category; + } + + return AIE_ERROR_UNKNOWN; +} + +static u32 aie2_error_backtrack(struct amdxdna_dev_hdl *ndev, void *err_info, u32 num_err) +{ + struct aie_error *errs = err_info; + u32 err_col = 0; /* assume that AIE has less than 32 columns */ + int i; + + /* Get err column bitmap */ + for (i = 0; i < num_err; i++) { + struct aie_error *err = &errs[i]; + enum aie_error_category cat; + + cat = aie_get_error_category(err->row, err->event_id, err->mod_type); + XDNA_ERR(ndev->xdna, "Row: %d, Col: %d, module %d, event ID %d, category %d", + err->row, err->col, err->mod_type, + err->event_id, cat); + + if (err->col >= 32) { + XDNA_WARN(ndev->xdna, "Invalid column number"); + break; + } + + err_col |= (1 << err->col); + } + + return err_col; +} + +static int aie2_error_async_cb(void *handle, const u32 *data, size_t size) +{ + struct async_event_msg_resp *resp; + struct async_event *e = handle; + + if (data) { + resp = (struct async_event_msg_resp *)data; + e->resp.type = resp->type; + wmb(); /* Update status in the end, so that no lock for here */ + e->resp.status = resp->status; + } + queue_work(e->wq, &e->work); + return 0; +} + +static int aie2_error_event_send(struct async_event *e) +{ + drm_clflush_virt_range(e->buf, e->size); /* device can access */ + return aie2_register_asyn_event_msg(e->ndev, e->addr, e->size, e, + aie2_error_async_cb); +} + +static void aie2_error_worker(struct work_struct *err_work) +{ + struct aie_err_info *info; + struct amdxdna_dev *xdna; + struct async_event *e; + u32 max_err; + u32 err_col; + + e = container_of(err_work, struct async_event, work); + + xdna = e->ndev->xdna; + + if (e->resp.status == MAX_AIE2_STATUS_CODE) + return; + + e->resp.status = MAX_AIE2_STATUS_CODE; + + print_hex_dump_debug("AIE error: ", DUMP_PREFIX_OFFSET, 16, 4, + e->buf, 0x100, false); + + info = (struct aie_err_info *)e->buf; + XDNA_DBG(xdna, "Error count %d return code %d", info->err_cnt, info->ret_code); + + max_err = (e->size - sizeof(*info)) / sizeof(struct aie_error); + if (unlikely(info->err_cnt > max_err)) { + WARN_ONCE(1, "Error count too large %d\n", info->err_cnt); + return; + } + err_col = aie2_error_backtrack(e->ndev, info->payload, info->err_cnt); + if (!err_col) { + XDNA_WARN(xdna, "Did not get error column"); + return; + } + + mutex_lock(&xdna->dev_lock); + /* Re-sent this event to firmware */ + if (aie2_error_event_send(e)) + XDNA_WARN(xdna, "Unable to register async event"); + mutex_unlock(&xdna->dev_lock); +} + +int aie2_error_async_events_send(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + struct async_event *e; + int i, ret; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + for (i = 0; i < ndev->async_events->event_cnt; i++) { + e = &ndev->async_events->event[i]; + ret = aie2_error_event_send(e); + if (ret) + return ret; + } + + return 0; +} + +void aie2_error_async_events_free(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + struct async_events *events; + + events = ndev->async_events; + + mutex_unlock(&xdna->dev_lock); + destroy_workqueue(events->wq); + mutex_lock(&xdna->dev_lock); + + dma_free_noncoherent(xdna->ddev.dev, events->size, events->buf, + events->addr, DMA_FROM_DEVICE); + kfree(events); +} + +int aie2_error_async_events_alloc(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + u32 total_col = ndev->total_col; + u32 total_size = ASYNC_BUF_SIZE * total_col; + struct async_events *events; + int i, ret; + + events = kzalloc(struct_size(events, event, total_col), GFP_KERNEL); + if (!events) + return -ENOMEM; + + events->buf = dma_alloc_noncoherent(xdna->ddev.dev, total_size, &events->addr, + DMA_FROM_DEVICE, GFP_KERNEL); + if (!events->buf) { + ret = -ENOMEM; + goto free_events; + } + events->size = total_size; + events->event_cnt = total_col; + + events->wq = alloc_ordered_workqueue("async_wq", 0); + if (!events->wq) { + ret = -ENOMEM; + goto free_buf; + } + + for (i = 0; i < events->event_cnt; i++) { + struct async_event *e = &events->event[i]; + u32 offset = i * ASYNC_BUF_SIZE; + + e->ndev = ndev; + e->wq = events->wq; + e->buf = &events->buf[offset]; + e->addr = events->addr + offset; + e->size = ASYNC_BUF_SIZE; + e->resp.status = MAX_AIE2_STATUS_CODE; + INIT_WORK(&e->work, aie2_error_worker); + } + + ndev->async_events = events; + + XDNA_DBG(xdna, "Async event count %d, buf total size 0x%x", + events->event_cnt, events->size); + return 0; + +free_buf: + dma_free_noncoherent(xdna->ddev.dev, events->size, events->buf, + events->addr, DMA_FROM_DEVICE); +free_events: + kfree(events); + return ret; +} diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c new file mode 100644 index 000000000000..c01a1d957b56 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_message.c @@ -0,0 +1,791 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_cache.h> +#include <drm/drm_device.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <linux/bitfield.h> +#include <linux/errno.h> +#include <linux/pci.h> +#include <linux/types.h> + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_mailbox_helper.h" +#include "amdxdna_pci_drv.h" + +#define DECLARE_AIE2_MSG(name, op) \ + DECLARE_XDNA_MSG_COMMON(name, op, MAX_AIE2_STATUS_CODE) + +static int aie2_send_mgmt_msg_wait(struct amdxdna_dev_hdl *ndev, + struct xdna_mailbox_msg *msg) +{ + struct amdxdna_dev *xdna = ndev->xdna; + struct xdna_notify *hdl = msg->handle; + int ret; + + if (!ndev->mgmt_chann) + return -ENODEV; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + ret = xdna_send_msg_wait(xdna, ndev->mgmt_chann, msg); + if (ret == -ETIME) { + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); + ndev->mgmt_chann = NULL; + } + + if (!ret && *hdl->data != AIE2_STATUS_SUCCESS) { + XDNA_ERR(xdna, "command opcode 0x%x failed, status 0x%x", + msg->opcode, *hdl->data); + ret = -EINVAL; + } + + return ret; +} + +int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(suspend, MSG_OP_SUSPEND); + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_resume_fw(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(suspend, MSG_OP_RESUME); + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value) +{ + DECLARE_AIE2_MSG(set_runtime_cfg, MSG_OP_SET_RUNTIME_CONFIG); + + req.type = type; + req.value = value; + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_get_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 *value) +{ + DECLARE_AIE2_MSG(get_runtime_cfg, MSG_OP_GET_RUNTIME_CONFIG); + int ret; + + req.type = type; + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(ndev->xdna, "Failed to get runtime config, ret %d", ret); + return ret; + } + + *value = resp.value; + return 0; +} + +int aie2_check_protocol_version(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(protocol_version, MSG_OP_GET_PROTOCOL_VERSION); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(xdna, "Failed to get protocol version, ret %d", ret); + return ret; + } + + if (resp.major != ndev->priv->protocol_major) { + XDNA_ERR(xdna, "Incompatible firmware protocol version major %d minor %d", + resp.major, resp.minor); + return -EINVAL; + } + + if (resp.minor < ndev->priv->protocol_minor) { + XDNA_ERR(xdna, "Firmware minor version smaller than supported"); + return -EINVAL; + } + + return 0; +} + +int aie2_assign_mgmt_pasid(struct amdxdna_dev_hdl *ndev, u16 pasid) +{ + DECLARE_AIE2_MSG(assign_mgmt_pasid, MSG_OP_ASSIGN_MGMT_PASID); + + req.pasid = pasid; + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_query_aie_version(struct amdxdna_dev_hdl *ndev, struct aie_version *version) +{ + DECLARE_AIE2_MSG(aie_version_info, MSG_OP_QUERY_AIE_VERSION); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + XDNA_DBG(xdna, "Query AIE version - major: %u minor: %u completed", + resp.major, resp.minor); + + version->major = resp.major; + version->minor = resp.minor; + + return 0; +} + +int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metadata *metadata) +{ + DECLARE_AIE2_MSG(aie_tile_info, MSG_OP_QUERY_AIE_TILE_INFO); + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + metadata->size = resp.info.size; + metadata->cols = resp.info.cols; + metadata->rows = resp.info.rows; + + metadata->version.major = resp.info.major; + metadata->version.minor = resp.info.minor; + + metadata->core.row_count = resp.info.core_rows; + metadata->core.row_start = resp.info.core_row_start; + metadata->core.dma_channel_count = resp.info.core_dma_channels; + metadata->core.lock_count = resp.info.core_locks; + metadata->core.event_reg_count = resp.info.core_events; + + metadata->mem.row_count = resp.info.mem_rows; + metadata->mem.row_start = resp.info.mem_row_start; + metadata->mem.dma_channel_count = resp.info.mem_dma_channels; + metadata->mem.lock_count = resp.info.mem_locks; + metadata->mem.event_reg_count = resp.info.mem_events; + + metadata->shim.row_count = resp.info.shim_rows; + metadata->shim.row_start = resp.info.shim_row_start; + metadata->shim.dma_channel_count = resp.info.shim_dma_channels; + metadata->shim.lock_count = resp.info.shim_locks; + metadata->shim.event_reg_count = resp.info.shim_events; + + return 0; +} + +int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, + struct amdxdna_fw_ver *fw_ver) +{ + DECLARE_AIE2_MSG(firmware_version, MSG_OP_GET_FIRMWARE_VERSION); + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + fw_ver->major = resp.major; + fw_ver->minor = resp.minor; + fw_ver->sub = resp.sub; + fw_ver->build = resp.build; + + return 0; +} + +int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx) +{ + DECLARE_AIE2_MSG(create_ctx, MSG_OP_CREATE_CONTEXT); + struct amdxdna_dev *xdna = ndev->xdna; + struct xdna_mailbox_chann_res x2i; + struct xdna_mailbox_chann_res i2x; + struct cq_pair *cq_pair; + u32 intr_reg; + int ret; + + req.aie_type = 1; + req.start_col = hwctx->start_col; + req.num_col = hwctx->num_col; + req.num_cq_pairs_requested = 1; + req.pasid = hwctx->client->pasid; + req.context_priority = 2; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + hwctx->fw_ctx_id = resp.context_id; + WARN_ONCE(hwctx->fw_ctx_id == -1, "Unexpected context id"); + + cq_pair = &resp.cq_pair[0]; + x2i.mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->x2i_q.head_addr); + x2i.mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->x2i_q.tail_addr); + x2i.rb_start_addr = AIE2_SRAM_OFF(ndev, cq_pair->x2i_q.buf_addr); + x2i.rb_size = cq_pair->x2i_q.buf_size; + + i2x.mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->i2x_q.head_addr); + i2x.mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->i2x_q.tail_addr); + i2x.rb_start_addr = AIE2_SRAM_OFF(ndev, cq_pair->i2x_q.buf_addr); + i2x.rb_size = cq_pair->i2x_q.buf_size; + + ret = pci_irq_vector(to_pci_dev(xdna->ddev.dev), resp.msix_id); + if (ret == -EINVAL) { + XDNA_ERR(xdna, "not able to create channel"); + goto out_destroy_context; + } + + intr_reg = i2x.mb_head_ptr_reg + 4; + hwctx->priv->mbox_chann = xdna_mailbox_create_channel(ndev->mbox, &x2i, &i2x, + intr_reg, ret); + if (!hwctx->priv->mbox_chann) { + XDNA_ERR(xdna, "not able to create channel"); + ret = -EINVAL; + goto out_destroy_context; + } + + XDNA_DBG(xdna, "%s mailbox channel irq: %d, msix_id: %d", + hwctx->name, ret, resp.msix_id); + XDNA_DBG(xdna, "%s created fw ctx %d pasid %d", hwctx->name, + hwctx->fw_ctx_id, hwctx->client->pasid); + + return 0; + +out_destroy_context: + aie2_destroy_context(ndev, hwctx); + return ret; +} + +int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx) +{ + DECLARE_AIE2_MSG(destroy_ctx, MSG_OP_DESTROY_CONTEXT); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + if (hwctx->fw_ctx_id == -1) + return 0; + + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); + + req.context_id = hwctx->fw_ctx_id; + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + XDNA_WARN(xdna, "%s destroy context failed, ret %d", hwctx->name, ret); + + xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); + XDNA_DBG(xdna, "%s destroyed fw ctx %d", hwctx->name, + hwctx->fw_ctx_id); + hwctx->priv->mbox_chann = NULL; + hwctx->fw_ctx_id = -1; + + return ret; +} + +int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 addr, u64 size) +{ + DECLARE_AIE2_MSG(map_host_buffer, MSG_OP_MAP_HOST_BUFFER); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + req.context_id = context_id; + req.buf_addr = addr; + req.buf_size = size; + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + XDNA_DBG(xdna, "fw ctx %d map host buf addr 0x%llx size 0x%llx", + context_id, addr, size); + + return 0; +} + +int aie2_query_status(struct amdxdna_dev_hdl *ndev, char __user *buf, + u32 size, u32 *cols_filled) +{ + DECLARE_AIE2_MSG(aie_column_info, MSG_OP_QUERY_COL_STATUS); + struct amdxdna_dev *xdna = ndev->xdna; + struct amdxdna_client *client; + struct amdxdna_hwctx *hwctx; + dma_addr_t dma_addr; + u32 aie_bitmap = 0; + u8 *buff_addr; + int next = 0; + int ret, idx; + + buff_addr = dma_alloc_noncoherent(xdna->ddev.dev, size, &dma_addr, + DMA_FROM_DEVICE, GFP_KERNEL); + if (!buff_addr) + return -ENOMEM; + + /* Go through each hardware context and mark the AIE columns that are active */ + list_for_each_entry(client, &xdna->client_list, node) { + idx = srcu_read_lock(&client->hwctx_srcu); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) + aie_bitmap |= amdxdna_hwctx_col_map(hwctx); + srcu_read_unlock(&client->hwctx_srcu, idx); + } + + *cols_filled = 0; + req.dump_buff_addr = dma_addr; + req.dump_buff_size = size; + req.num_cols = hweight32(aie_bitmap); + req.aie_bitmap = aie_bitmap; + + drm_clflush_virt_range(buff_addr, size); /* device can access */ + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(xdna, "Error during NPU query, status %d", ret); + goto fail; + } + + if (resp.status != AIE2_STATUS_SUCCESS) { + XDNA_ERR(xdna, "Query NPU status failed, status 0x%x", resp.status); + ret = -EINVAL; + goto fail; + } + XDNA_DBG(xdna, "Query NPU status completed"); + + if (size < resp.size) { + ret = -EINVAL; + XDNA_ERR(xdna, "Bad buffer size. Available: %u. Needs: %u", size, resp.size); + goto fail; + } + + if (copy_to_user(buf, buff_addr, resp.size)) { + ret = -EFAULT; + XDNA_ERR(xdna, "Failed to copy NPU status to user space"); + goto fail; + } + + *cols_filled = aie_bitmap; + +fail: + dma_free_noncoherent(xdna->ddev.dev, size, buff_addr, dma_addr, DMA_FROM_DEVICE); + return ret; +} + +int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t addr, u32 size, + void *handle, int (*cb)(void*, const u32 *, size_t)) +{ + struct async_event_msg_req req = { 0 }; + struct xdna_mailbox_msg msg = { + .send_data = (u8 *)&req, + .send_size = sizeof(req), + .handle = handle, + .opcode = MSG_OP_REGISTER_ASYNC_EVENT_MSG, + .notify_cb = cb, + }; + + req.buf_addr = addr; + req.buf_size = size; + + XDNA_DBG(ndev->xdna, "Register addr 0x%llx size 0x%x", addr, size); + return xdna_mailbox_send_msg(ndev->mgmt_chann, &msg, TX_TIMEOUT); +} + +int aie2_config_cu(struct amdxdna_hwctx *hwctx) +{ + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna = hwctx->client->xdna; + u32 shift = xdna->dev_info->dev_mem_buf_shift; + DECLARE_AIE2_MSG(config_cu, MSG_OP_CONFIG_CU); + struct drm_gem_object *gobj; + struct amdxdna_gem_obj *abo; + int ret, i; + + if (!chann) + return -ENODEV; + + if (hwctx->cus->num_cus > MAX_NUM_CUS) { + XDNA_DBG(xdna, "Exceed maximum CU %d", MAX_NUM_CUS); + return -EINVAL; + } + + for (i = 0; i < hwctx->cus->num_cus; i++) { + struct amdxdna_cu_config *cu = &hwctx->cus->cu_configs[i]; + + gobj = drm_gem_object_lookup(hwctx->client->filp, cu->cu_bo); + if (!gobj) { + XDNA_ERR(xdna, "Lookup GEM object failed"); + return -EINVAL; + } + abo = to_xdna_obj(gobj); + + if (abo->type != AMDXDNA_BO_DEV) { + drm_gem_object_put(gobj); + XDNA_ERR(xdna, "Invalid BO type"); + return -EINVAL; + } + + req.cfgs[i] = FIELD_PREP(AIE2_MSG_CFG_CU_PDI_ADDR, + abo->mem.dev_addr >> shift); + req.cfgs[i] |= FIELD_PREP(AIE2_MSG_CFG_CU_FUNC, cu->cu_func); + XDNA_DBG(xdna, "CU %d full addr 0x%llx, cfg 0x%x", i, + abo->mem.dev_addr, req.cfgs[i]); + drm_gem_object_put(gobj); + } + req.num_cus = hwctx->cus->num_cus; + + ret = xdna_send_msg_wait(xdna, chann, &msg); + if (ret == -ETIME) + aie2_destroy_context(xdna->dev_handle, hwctx); + + if (resp.status == AIE2_STATUS_SUCCESS) { + XDNA_DBG(xdna, "Configure %d CUs, ret %d", req.num_cus, ret); + return 0; + } + + XDNA_ERR(xdna, "Command opcode 0x%x failed, status 0x%x ret %d", + msg.opcode, resp.status, ret); + return ret; +} + +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + union { + struct execute_buffer_req ebuf; + struct exec_dpu_req dpu; + } req; + struct xdna_mailbox_msg msg; + u32 payload_len; + void *payload; + int cu_idx; + int ret; + u32 op; + + if (!chann) + return -ENODEV; + + payload = amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (!payload) { + XDNA_ERR(xdna, "Invalid command, cannot get payload"); + return -EINVAL; + } + + cu_idx = amdxdna_cmd_get_cu_idx(cmd_abo); + if (cu_idx < 0) { + XDNA_DBG(xdna, "Invalid cu idx"); + return -EINVAL; + } + + op = amdxdna_cmd_get_op(cmd_abo); + switch (op) { + case ERT_START_CU: + if (unlikely(payload_len > sizeof(req.ebuf.payload))) + XDNA_DBG(xdna, "Invalid ebuf payload len: %d", payload_len); + req.ebuf.cu_idx = cu_idx; + memcpy(req.ebuf.payload, payload, sizeof(req.ebuf.payload)); + msg.send_size = sizeof(req.ebuf); + msg.opcode = MSG_OP_EXECUTE_BUFFER_CF; + break; + case ERT_START_NPU: { + struct amdxdna_cmd_start_npu *sn = payload; + + if (unlikely(payload_len - sizeof(*sn) > sizeof(req.dpu.payload))) + XDNA_DBG(xdna, "Invalid dpu payload len: %d", payload_len); + req.dpu.inst_buf_addr = sn->buffer; + req.dpu.inst_size = sn->buffer_size; + req.dpu.inst_prop_cnt = sn->prop_count; + req.dpu.cu_idx = cu_idx; + memcpy(req.dpu.payload, sn->prop_args, sizeof(req.dpu.payload)); + msg.send_size = sizeof(req.dpu); + msg.opcode = MSG_OP_EXEC_DPU; + break; + } + default: + XDNA_DBG(xdna, "Invalid ERT cmd op code: %d", op); + return -EINVAL; + } + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + print_hex_dump_debug("cmd: ", DUMP_PREFIX_OFFSET, 16, 4, &req, + 0x40, false); + + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_cf(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_execbuf_cf *buf = cmd_buf + offset; + int cu_idx = amdxdna_cmd_get_cu_idx(abo); + u32 payload_len; + void *payload; + + if (cu_idx < 0) + return -EINVAL; + + payload = amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + + if (!slot_cf_has_space(offset, payload_len)) + return -ENOSPC; + + buf->cu_idx = cu_idx; + buf->arg_cnt = payload_len / sizeof(u32); + memcpy(buf->args, payload, payload_len); + /* Accurate buf size to hint firmware to do necessary copy */ + *size = sizeof(*buf) + payload_len; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_dpu(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_dpu *buf = cmd_buf + offset; + int cu_idx = amdxdna_cmd_get_cu_idx(abo); + struct amdxdna_cmd_start_npu *sn; + u32 payload_len; + void *payload; + u32 arg_sz; + + if (cu_idx < 0) + return -EINVAL; + + payload = amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + sn = payload; + arg_sz = payload_len - sizeof(*sn); + if (payload_len < sizeof(*sn) || arg_sz > MAX_DPU_ARGS_SIZE) + return -EINVAL; + + if (!slot_dpu_has_space(offset, arg_sz)) + return -ENOSPC; + + buf->inst_buf_addr = sn->buffer; + buf->inst_size = sn->buffer_size; + buf->inst_prop_cnt = sn->prop_count; + buf->cu_idx = cu_idx; + buf->arg_cnt = arg_sz / sizeof(u32); + memcpy(buf->args, sn->prop_args, arg_sz); + + /* Accurate buf size to hint firmware to do necessary copy */ + *size += sizeof(*buf) + arg_sz; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot(u32 op, struct amdxdna_gem_obj *cmdbuf_abo, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + u32 this_op = amdxdna_cmd_get_op(abo); + void *cmd_buf = cmdbuf_abo->mem.kva; + int ret; + + if (this_op != op) { + ret = -EINVAL; + goto done; + } + + switch (op) { + case ERT_START_CU: + ret = aie2_cmdlist_fill_one_slot_cf(cmd_buf, offset, abo, size); + break; + case ERT_START_NPU: + ret = aie2_cmdlist_fill_one_slot_dpu(cmd_buf, offset, abo, size); + break; + default: + ret = -EOPNOTSUPP; + } + +done: + if (ret) { + XDNA_ERR(abo->client->xdna, "Can't fill slot for cmd op %d ret %d", + op, ret); + } + return ret; +} + +static inline struct amdxdna_gem_obj * +aie2_cmdlist_get_cmd_buf(struct amdxdna_sched_job *job) +{ + int idx = get_job_idx(job->seq); + + return job->hwctx->priv->cmd_buf[idx]; +} + +static void +aie2_cmdlist_prepare_request(struct cmd_chain_req *req, + struct amdxdna_gem_obj *cmdbuf_abo, u32 size, u32 cnt) +{ + req->buf_addr = cmdbuf_abo->mem.dev_addr; + req->buf_size = size; + req->count = cnt; + drm_clflush_virt_range(cmdbuf_abo->mem.kva, size); + XDNA_DBG(cmdbuf_abo->client->xdna, "Command buf addr 0x%llx size 0x%x count %d", + req->buf_addr, size, cnt); +} + +static inline u32 +aie2_cmd_op_to_msg_op(u32 op) +{ + switch (op) { + case ERT_START_CU: + return MSG_OP_CHAIN_EXEC_BUFFER_CF; + case ERT_START_NPU: + return MSG_OP_CHAIN_EXEC_DPU; + default: + return MSG_OP_MAX_OPCODE; + } +} + +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo = aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_client *client = hwctx->client; + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + struct amdxdna_cmd_chain *payload; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 payload_len; + u32 offset = 0; + u32 size; + int ret; + u32 op; + u32 i; + + op = amdxdna_cmd_get_op(cmd_abo); + payload = amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (op != ERT_CMD_CHAIN || !payload || + payload_len < struct_size(payload, data, payload->command_count)) + return -EINVAL; + + for (i = 0; i < payload->command_count; i++) { + u32 boh = (u32)(payload->data[i]); + struct amdxdna_gem_obj *abo; + + abo = amdxdna_gem_get_obj(client, boh, AMDXDNA_BO_CMD); + if (!abo) { + XDNA_ERR(client->xdna, "Failed to find cmd BO %d", boh); + return -ENOENT; + } + + /* All sub-cmd should have same op, use the first one. */ + if (i == 0) + op = amdxdna_cmd_get_op(abo); + + ret = aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, offset, abo, &size); + amdxdna_gem_put_obj(abo); + if (ret) + return -EINVAL; + + offset += size; + } + + /* The offset is the accumulated total size of the cmd buffer */ + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, offset, payload->command_count); + + msg.opcode = aie2_cmd_op_to_msg_op(op); + if (msg.opcode == MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + msg.send_size = sizeof(req); + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo = aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 size; + int ret; + u32 op; + + op = amdxdna_cmd_get_op(cmd_abo); + ret = aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, 0, cmd_abo, &size); + if (ret) + return ret; + + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, size, 1); + + msg.opcode = aie2_cmd_op_to_msg_op(op); + if (msg.opcode == MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + msg.send_size = sizeof(req); + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *abo = to_xdna_obj(job->bos[0]); + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct xdna_mailbox_msg msg; + struct sync_bo_req req; + int ret = 0; + + req.src_addr = 0; + req.dst_addr = abo->mem.dev_addr - hwctx->client->dev_heap->mem.dev_addr; + req.size = abo->mem.size; + + /* Device to Host */ + req.type = FIELD_PREP(AIE2_MSG_SYNC_BO_SRC_TYPE, SYNC_BO_DEV_MEM) | + FIELD_PREP(AIE2_MSG_SYNC_BO_DST_TYPE, SYNC_BO_HOST_MEM); + + XDNA_DBG(xdna, "sync %d bytes src(0x%llx) to dst(0x%llx) completed", + req.size, req.src_addr, req.dst_addr); + + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + msg.send_size = sizeof(req); + msg.opcode = MSG_OP_SYNC_BO; + + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} diff --git a/drivers/accel/amdxdna/aie2_msg_priv.h b/drivers/accel/amdxdna/aie2_msg_priv.h new file mode 100644 index 000000000000..4e02e744b470 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_msg_priv.h @@ -0,0 +1,370 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_MSG_PRIV_H_ +#define _AIE2_MSG_PRIV_H_ + +enum aie2_msg_opcode { + MSG_OP_CREATE_CONTEXT = 0x2, + MSG_OP_DESTROY_CONTEXT = 0x3, + MSG_OP_SYNC_BO = 0x7, + MSG_OP_EXECUTE_BUFFER_CF = 0xC, + MSG_OP_QUERY_COL_STATUS = 0xD, + MSG_OP_QUERY_AIE_TILE_INFO = 0xE, + MSG_OP_QUERY_AIE_VERSION = 0xF, + MSG_OP_EXEC_DPU = 0x10, + MSG_OP_CONFIG_CU = 0x11, + MSG_OP_CHAIN_EXEC_BUFFER_CF = 0x12, + MSG_OP_CHAIN_EXEC_DPU = 0x13, + MSG_OP_MAX_XRT_OPCODE, + MSG_OP_SUSPEND = 0x101, + MSG_OP_RESUME = 0x102, + MSG_OP_ASSIGN_MGMT_PASID = 0x103, + MSG_OP_INVOKE_SELF_TEST = 0x104, + MSG_OP_MAP_HOST_BUFFER = 0x106, + MSG_OP_GET_FIRMWARE_VERSION = 0x108, + MSG_OP_SET_RUNTIME_CONFIG = 0x10A, + MSG_OP_GET_RUNTIME_CONFIG = 0x10B, + MSG_OP_REGISTER_ASYNC_EVENT_MSG = 0x10C, + MSG_OP_MAX_DRV_OPCODE, + MSG_OP_GET_PROTOCOL_VERSION = 0x301, + MSG_OP_MAX_OPCODE +}; + +enum aie2_msg_status { + AIE2_STATUS_SUCCESS = 0x0, + /* AIE Error codes */ + AIE2_STATUS_AIE_SATURATION_ERROR = 0x1000001, + AIE2_STATUS_AIE_FP_ERROR = 0x1000002, + AIE2_STATUS_AIE_STREAM_ERROR = 0x1000003, + AIE2_STATUS_AIE_ACCESS_ERROR = 0x1000004, + AIE2_STATUS_AIE_BUS_ERROR = 0x1000005, + AIE2_STATUS_AIE_INSTRUCTION_ERROR = 0x1000006, + AIE2_STATUS_AIE_ECC_ERROR = 0x1000007, + AIE2_STATUS_AIE_LOCK_ERROR = 0x1000008, + AIE2_STATUS_AIE_DMA_ERROR = 0x1000009, + AIE2_STATUS_AIE_MEM_PARITY_ERROR = 0x100000a, + AIE2_STATUS_AIE_PWR_CFG_ERROR = 0x100000b, + AIE2_STATUS_AIE_BACKTRACK_ERROR = 0x100000c, + AIE2_STATUS_MAX_AIE_STATUS_CODE, + /* MGMT ERT Error codes */ + AIE2_STATUS_MGMT_ERT_SELF_TEST_FAILURE = 0x2000001, + AIE2_STATUS_MGMT_ERT_HASH_MISMATCH, + AIE2_STATUS_MGMT_ERT_NOAVAIL, + AIE2_STATUS_MGMT_ERT_INVALID_PARAM, + AIE2_STATUS_MGMT_ERT_ENTER_SUSPEND_FAILURE, + AIE2_STATUS_MGMT_ERT_BUSY, + AIE2_STATUS_MGMT_ERT_APPLICATION_ACTIVE, + MAX_MGMT_ERT_STATUS_CODE, + /* APP ERT Error codes */ + AIE2_STATUS_APP_ERT_FIRST_ERROR = 0x3000001, + AIE2_STATUS_APP_INVALID_INSTR, + AIE2_STATUS_APP_LOAD_PDI_FAIL, + MAX_APP_ERT_STATUS_CODE, + /* NPU RTOS Error Codes */ + AIE2_STATUS_INVALID_INPUT_BUFFER = 0x4000001, + AIE2_STATUS_INVALID_COMMAND, + AIE2_STATUS_INVALID_PARAM, + AIE2_STATUS_INVALID_OPERATION = 0x4000006, + AIE2_STATUS_ASYNC_EVENT_MSGS_FULL, + AIE2_STATUS_MAX_RTOS_STATUS_CODE, + MAX_AIE2_STATUS_CODE +}; + +struct assign_mgmt_pasid_req { + __u16 pasid; + __u16 reserved; +} __packed; + +struct assign_mgmt_pasid_resp { + enum aie2_msg_status status; +} __packed; + +struct map_host_buffer_req { + __u32 context_id; + __u64 buf_addr; + __u64 buf_size; +} __packed; + +struct map_host_buffer_resp { + enum aie2_msg_status status; +} __packed; + +#define MAX_CQ_PAIRS 2 +struct cq_info { + __u32 head_addr; + __u32 tail_addr; + __u32 buf_addr; + __u32 buf_size; +}; + +struct cq_pair { + struct cq_info x2i_q; + struct cq_info i2x_q; +}; + +struct create_ctx_req { + __u32 aie_type; + __u8 start_col; + __u8 num_col; + __u16 reserved; + __u8 num_cq_pairs_requested; + __u8 reserved1; + __u16 pasid; + __u32 pad[2]; + __u32 sec_comm_target_type; + __u32 context_priority; +} __packed; + +struct create_ctx_resp { + enum aie2_msg_status status; + __u32 context_id; + __u16 msix_id; + __u8 num_cq_pairs_allocated; + __u8 reserved; + struct cq_pair cq_pair[MAX_CQ_PAIRS]; +} __packed; + +struct destroy_ctx_req { + __u32 context_id; +} __packed; + +struct destroy_ctx_resp { + enum aie2_msg_status status; +} __packed; + +struct execute_buffer_req { + __u32 cu_idx; + __u32 payload[19]; +} __packed; + +struct exec_dpu_req { + __u64 inst_buf_addr; + __u32 inst_size; + __u32 inst_prop_cnt; + __u32 cu_idx; + __u32 payload[35]; +} __packed; + +struct execute_buffer_resp { + enum aie2_msg_status status; +} __packed; + +struct aie_tile_info { + __u32 size; + __u16 major; + __u16 minor; + __u16 cols; + __u16 rows; + __u16 core_rows; + __u16 mem_rows; + __u16 shim_rows; + __u16 core_row_start; + __u16 mem_row_start; + __u16 shim_row_start; + __u16 core_dma_channels; + __u16 mem_dma_channels; + __u16 shim_dma_channels; + __u16 core_locks; + __u16 mem_locks; + __u16 shim_locks; + __u16 core_events; + __u16 mem_events; + __u16 shim_events; + __u16 reserved; +}; + +struct aie_tile_info_req { + __u32 reserved; +} __packed; + +struct aie_tile_info_resp { + enum aie2_msg_status status; + struct aie_tile_info info; +} __packed; + +struct aie_version_info_req { + __u32 reserved; +} __packed; + +struct aie_version_info_resp { + enum aie2_msg_status status; + __u16 major; + __u16 minor; +} __packed; + +struct aie_column_info_req { + __u64 dump_buff_addr; + __u32 dump_buff_size; + __u32 num_cols; + __u32 aie_bitmap; +} __packed; + +struct aie_column_info_resp { + enum aie2_msg_status status; + __u32 size; +} __packed; + +struct suspend_req { + __u32 place_holder; +} __packed; + +struct suspend_resp { + enum aie2_msg_status status; +} __packed; + +struct resume_req { + __u32 place_holder; +} __packed; + +struct resume_resp { + enum aie2_msg_status status; +} __packed; + +struct check_header_hash_req { + __u64 hash_high; + __u64 hash_low; +} __packed; + +struct check_header_hash_resp { + enum aie2_msg_status status; +} __packed; + +struct query_error_req { + __u64 buf_addr; + __u32 buf_size; + __u32 next_row; + __u32 next_column; + __u32 next_module; +} __packed; + +struct query_error_resp { + enum aie2_msg_status status; + __u32 num_err; + __u32 has_next_err; + __u32 next_row; + __u32 next_column; + __u32 next_module; +} __packed; + +struct protocol_version_req { + __u32 reserved; +} __packed; + +struct protocol_version_resp { + enum aie2_msg_status status; + __u32 major; + __u32 minor; +} __packed; + +struct firmware_version_req { + __u32 reserved; +} __packed; + +struct firmware_version_resp { + enum aie2_msg_status status; + __u32 major; + __u32 minor; + __u32 sub; + __u32 build; +} __packed; + +#define MAX_NUM_CUS 32 +#define AIE2_MSG_CFG_CU_PDI_ADDR GENMASK(16, 0) +#define AIE2_MSG_CFG_CU_FUNC GENMASK(24, 17) +struct config_cu_req { + __u32 num_cus; + __u32 cfgs[MAX_NUM_CUS]; +} __packed; + +struct config_cu_resp { + enum aie2_msg_status status; +} __packed; + +struct set_runtime_cfg_req { + __u32 type; + __u64 value; +} __packed; + +struct set_runtime_cfg_resp { + enum aie2_msg_status status; +} __packed; + +struct get_runtime_cfg_req { + __u32 type; +} __packed; + +struct get_runtime_cfg_resp { + enum aie2_msg_status status; + __u64 value; +} __packed; + +enum async_event_type { + ASYNC_EVENT_TYPE_AIE_ERROR, + ASYNC_EVENT_TYPE_EXCEPTION, + MAX_ASYNC_EVENT_TYPE +}; + +#define ASYNC_BUF_SIZE SZ_8K +struct async_event_msg_req { + __u64 buf_addr; + __u32 buf_size; +} __packed; + +struct async_event_msg_resp { + enum aie2_msg_status status; + enum async_event_type type; +} __packed; + +#define MAX_CHAIN_CMDBUF_SIZE SZ_4K +#define slot_cf_has_space(offset, payload_size) \ + (MAX_CHAIN_CMDBUF_SIZE - ((offset) + (payload_size)) > \ + offsetof(struct cmd_chain_slot_execbuf_cf, args[0])) +struct cmd_chain_slot_execbuf_cf { + __u32 cu_idx; + __u32 arg_cnt; + __u32 args[] __counted_by(arg_cnt); +}; + +#define slot_dpu_has_space(offset, payload_size) \ + (MAX_CHAIN_CMDBUF_SIZE - ((offset) + (payload_size)) > \ + offsetof(struct cmd_chain_slot_dpu, args[0])) +struct cmd_chain_slot_dpu { + __u64 inst_buf_addr; + __u32 inst_size; + __u32 inst_prop_cnt; + __u32 cu_idx; + __u32 arg_cnt; +#define MAX_DPU_ARGS_SIZE (34 * sizeof(__u32)) + __u32 args[] __counted_by(arg_cnt); +}; + +struct cmd_chain_req { + __u64 buf_addr; + __u32 buf_size; + __u32 count; +} __packed; + +struct cmd_chain_resp { + enum aie2_msg_status status; + __u32 fail_cmd_idx; + enum aie2_msg_status fail_cmd_status; +} __packed; + +#define AIE2_MSG_SYNC_BO_SRC_TYPE GENMASK(3, 0) +#define AIE2_MSG_SYNC_BO_DST_TYPE GENMASK(7, 4) +struct sync_bo_req { + __u64 src_addr; + __u64 dst_addr; + __u32 size; +#define SYNC_BO_DEV_MEM 0 +#define SYNC_BO_HOST_MEM 2 + __u32 type; +} __packed; + +struct sync_bo_resp { + enum aie2_msg_status status; +} __packed; +#endif /* _AIE2_MSG_PRIV_H_ */ diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c new file mode 100644 index 000000000000..349ada697e48 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -0,0 +1,762 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/drm_drv.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_managed.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <linux/errno.h> +#include <linux/firmware.h> +#include <linux/iommu.h> +#include <linux/iopoll.h> +#include <linux/pci.h> + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "aie2_solver.h" +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +int aie2_max_col = XRS_MAX_COL; +module_param(aie2_max_col, uint, 0600); +MODULE_PARM_DESC(aie2_max_col, "Maximum column could be used"); + +/* + * The management mailbox channel is allocated by firmware. + * The related register and ring buffer information is on SRAM BAR. + * This struct is the register layout. + */ +struct mgmt_mbox_chann_info { + u32 x2i_tail; + u32 x2i_head; + u32 x2i_buf; + u32 x2i_buf_sz; + u32 i2x_tail; + u32 i2x_head; + u32 i2x_buf; + u32 i2x_buf_sz; +}; + +static void aie2_dump_chann_info_debug(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + + XDNA_DBG(xdna, "i2x tail 0x%x", ndev->mgmt_i2x.mb_tail_ptr_reg); + XDNA_DBG(xdna, "i2x head 0x%x", ndev->mgmt_i2x.mb_head_ptr_reg); + XDNA_DBG(xdna, "i2x ringbuf 0x%x", ndev->mgmt_i2x.rb_start_addr); + XDNA_DBG(xdna, "i2x rsize 0x%x", ndev->mgmt_i2x.rb_size); + XDNA_DBG(xdna, "x2i tail 0x%x", ndev->mgmt_x2i.mb_tail_ptr_reg); + XDNA_DBG(xdna, "x2i head 0x%x", ndev->mgmt_x2i.mb_head_ptr_reg); + XDNA_DBG(xdna, "x2i ringbuf 0x%x", ndev->mgmt_x2i.rb_start_addr); + XDNA_DBG(xdna, "x2i rsize 0x%x", ndev->mgmt_x2i.rb_size); + XDNA_DBG(xdna, "x2i chann index 0x%x", ndev->mgmt_chan_idx); +} + +static int aie2_get_mgmt_chann_info(struct amdxdna_dev_hdl *ndev) +{ + struct mgmt_mbox_chann_info info_regs; + struct xdna_mailbox_chann_res *i2x; + struct xdna_mailbox_chann_res *x2i; + u32 addr, off; + u32 *reg; + int ret; + int i; + + /* + * Once firmware is alive, it will write management channel + * information in SRAM BAR and write the address of that information + * at FW_ALIVE_OFF offset in SRMA BAR. + * + * Read a non-zero value from FW_ALIVE_OFF implies that firmware + * is alive. + */ + ret = readx_poll_timeout(readl, SRAM_GET_ADDR(ndev, FW_ALIVE_OFF), + addr, addr, AIE2_INTERVAL, AIE2_TIMEOUT); + if (ret || !addr) + return -ETIME; + + off = AIE2_SRAM_OFF(ndev, addr); + reg = (u32 *)&info_regs; + for (i = 0; i < sizeof(info_regs) / sizeof(u32); i++) + reg[i] = readl(ndev->sram_base + off + i * sizeof(u32)); + + i2x = &ndev->mgmt_i2x; + x2i = &ndev->mgmt_x2i; + + i2x->mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.i2x_head); + i2x->mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.i2x_tail); + i2x->rb_start_addr = AIE2_SRAM_OFF(ndev, info_regs.i2x_buf); + i2x->rb_size = info_regs.i2x_buf_sz; + + x2i->mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.x2i_head); + x2i->mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.x2i_tail); + x2i->rb_start_addr = AIE2_SRAM_OFF(ndev, info_regs.x2i_buf); + x2i->rb_size = info_regs.x2i_buf_sz; + ndev->mgmt_chan_idx = CHANN_INDEX(ndev, x2i->rb_start_addr); + + aie2_dump_chann_info_debug(ndev); + + /* Must clear address at FW_ALIVE_OFF */ + writel(0, SRAM_GET_ADDR(ndev, FW_ALIVE_OFF)); + + return 0; +} + +static int aie2_runtime_cfg(struct amdxdna_dev_hdl *ndev) +{ + const struct rt_config *cfg = &ndev->priv->rt_config; + u64 value; + int ret; + + ret = aie2_set_runtime_cfg(ndev, cfg->type, cfg->value); + if (ret) { + XDNA_ERR(ndev->xdna, "Set runtime type %d value %d failed", + cfg->type, cfg->value); + return ret; + } + + ret = aie2_get_runtime_cfg(ndev, cfg->type, &value); + if (ret) { + XDNA_ERR(ndev->xdna, "Get runtime cfg failed"); + return ret; + } + + if (value != cfg->value) + return -EINVAL; + + return 0; +} + +static int aie2_xdna_reset(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_suspend_fw(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Suspend firmware failed"); + return ret; + } + + ret = aie2_resume_fw(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Resume firmware failed"); + return ret; + } + + return 0; +} + +static int aie2_mgmt_fw_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_check_protocol_version(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Check header hash failed"); + return ret; + } + + ret = aie2_runtime_cfg(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Runtime config failed"); + return ret; + } + + ret = aie2_assign_mgmt_pasid(ndev, 0); + if (ret) { + XDNA_ERR(ndev->xdna, "Can not assign PASID"); + return ret; + } + + ret = aie2_xdna_reset(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Reset firmware failed"); + return ret; + } + + if (!ndev->async_events) + return 0; + + ret = aie2_error_async_events_send(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Send async events failed"); + return ret; + } + + return 0; +} + +static int aie2_mgmt_fw_query(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_query_firmware_version(ndev, &ndev->xdna->fw_ver); + if (ret) { + XDNA_ERR(ndev->xdna, "query firmware version failed"); + return ret; + } + + ret = aie2_query_aie_version(ndev, &ndev->version); + if (ret) { + XDNA_ERR(ndev->xdna, "Query AIE version failed"); + return ret; + } + + ret = aie2_query_aie_metadata(ndev, &ndev->metadata); + if (ret) { + XDNA_ERR(ndev->xdna, "Query AIE metadata failed"); + return ret; + } + + return 0; +} + +static void aie2_mgmt_fw_fini(struct amdxdna_dev_hdl *ndev) +{ + if (aie2_suspend_fw(ndev)) + XDNA_ERR(ndev->xdna, "Suspend_fw failed"); + XDNA_DBG(ndev->xdna, "Firmware suspended"); +} + +static int aie2_xrs_load(void *cb_arg, struct xrs_action_load *action) +{ + struct amdxdna_hwctx *hwctx = cb_arg; + struct amdxdna_dev *xdna; + int ret; + + xdna = hwctx->client->xdna; + + hwctx->start_col = action->part.start_col; + hwctx->num_col = action->part.ncols; + ret = aie2_create_context(xdna->dev_handle, hwctx); + if (ret) + XDNA_ERR(xdna, "create context failed, ret %d", ret); + + return ret; +} + +static int aie2_xrs_unload(void *cb_arg) +{ + struct amdxdna_hwctx *hwctx = cb_arg; + struct amdxdna_dev *xdna; + int ret; + + xdna = hwctx->client->xdna; + + ret = aie2_destroy_context(xdna->dev_handle, hwctx); + if (ret) + XDNA_ERR(xdna, "destroy context failed, ret %d", ret); + + return ret; +} + +static struct xrs_action_ops aie2_xrs_actions = { + .load = aie2_xrs_load, + .unload = aie2_xrs_unload, +}; + +static void aie2_hw_stop(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev = xdna->dev_handle; + + aie2_mgmt_fw_fini(ndev); + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); + aie2_psp_stop(ndev->psp_hdl); + aie2_smu_fini(ndev); + pci_disable_device(pdev); +} + +static int aie2_hw_start(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev = xdna->dev_handle; + struct xdna_mailbox_res mbox_res; + u32 xdna_mailbox_intr_reg; + int mgmt_mb_irq, ret; + + ret = pci_enable_device(pdev); + if (ret) { + XDNA_ERR(xdna, "failed to enable device, ret %d", ret); + return ret; + } + pci_set_master(pdev); + + ret = aie2_smu_init(ndev); + if (ret) { + XDNA_ERR(xdna, "failed to init smu, ret %d", ret); + goto disable_dev; + } + + ret = aie2_psp_start(ndev->psp_hdl); + if (ret) { + XDNA_ERR(xdna, "failed to start psp, ret %d", ret); + goto fini_smu; + } + + ret = aie2_get_mgmt_chann_info(ndev); + if (ret) { + XDNA_ERR(xdna, "firmware is not alive"); + goto stop_psp; + } + + mbox_res.ringbuf_base = (u64)ndev->sram_base; + mbox_res.ringbuf_size = pci_resource_len(pdev, xdna->dev_info->sram_bar); + mbox_res.mbox_base = (u64)ndev->mbox_base; + mbox_res.mbox_size = MBOX_SIZE(ndev); + mbox_res.name = "xdna_mailbox"; + ndev->mbox = xdnam_mailbox_create(&xdna->ddev, &mbox_res); + if (!ndev->mbox) { + XDNA_ERR(xdna, "failed to create mailbox device"); + ret = -ENODEV; + goto stop_psp; + } + + mgmt_mb_irq = pci_irq_vector(pdev, ndev->mgmt_chan_idx); + if (mgmt_mb_irq < 0) { + ret = mgmt_mb_irq; + XDNA_ERR(xdna, "failed to alloc irq vector, ret %d", ret); + goto stop_psp; + } + + xdna_mailbox_intr_reg = ndev->mgmt_i2x.mb_head_ptr_reg + 4; + ndev->mgmt_chann = xdna_mailbox_create_channel(ndev->mbox, + &ndev->mgmt_x2i, + &ndev->mgmt_i2x, + xdna_mailbox_intr_reg, + mgmt_mb_irq); + if (!ndev->mgmt_chann) { + XDNA_ERR(xdna, "failed to create management mailbox channel"); + ret = -EINVAL; + goto stop_psp; + } + + ret = aie2_mgmt_fw_init(ndev); + if (ret) { + XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); + goto destroy_mgmt_chann; + } + + return 0; + +destroy_mgmt_chann: + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); +stop_psp: + aie2_psp_stop(ndev->psp_hdl); +fini_smu: + aie2_smu_fini(ndev); +disable_dev: + pci_disable_device(pdev); + + return ret; +} + +static int aie2_init(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + void __iomem *tbl[PCI_NUM_RESOURCES] = {0}; + struct init_config xrs_cfg = { 0 }; + struct amdxdna_dev_hdl *ndev; + struct psp_config psp_conf; + const struct firmware *fw; + unsigned long bars = 0; + int i, nvec, ret; + + ndev = drmm_kzalloc(&xdna->ddev, sizeof(*ndev), GFP_KERNEL); + if (!ndev) + return -ENOMEM; + + ndev->priv = xdna->dev_info->dev_priv; + ndev->xdna = xdna; + + ret = request_firmware(&fw, ndev->priv->fw_path, &pdev->dev); + if (ret) { + XDNA_ERR(xdna, "failed to request_firmware %s, ret %d", + ndev->priv->fw_path, ret); + return ret; + } + + ret = pcim_enable_device(pdev); + if (ret) { + XDNA_ERR(xdna, "pcim enable device failed, ret %d", ret); + goto release_fw; + } + + for (i = 0; i < PSP_MAX_REGS; i++) + set_bit(PSP_REG_BAR(ndev, i), &bars); + + set_bit(xdna->dev_info->sram_bar, &bars); + set_bit(xdna->dev_info->smu_bar, &bars); + set_bit(xdna->dev_info->mbox_bar, &bars); + + for (i = 0; i < PCI_NUM_RESOURCES; i++) { + if (!test_bit(i, &bars)) + continue; + tbl[i] = pcim_iomap(pdev, i, 0); + if (!tbl[i]) { + XDNA_ERR(xdna, "map bar %d failed", i); + ret = -ENOMEM; + goto release_fw; + } + } + + ndev->sram_base = tbl[xdna->dev_info->sram_bar]; + ndev->smu_base = tbl[xdna->dev_info->smu_bar]; + ndev->mbox_base = tbl[xdna->dev_info->mbox_bar]; + + ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); + if (ret) { + XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret); + goto release_fw; + } + + nvec = pci_msix_vec_count(pdev); + if (nvec <= 0) { + XDNA_ERR(xdna, "does not get number of interrupt vector"); + ret = -EINVAL; + goto release_fw; + } + + ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_MSIX); + if (ret < 0) { + XDNA_ERR(xdna, "failed to alloc irq vectors, ret %d", ret); + goto release_fw; + } + + ret = iommu_dev_enable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); + if (ret) { + XDNA_ERR(xdna, "Enable PASID failed, ret %d", ret); + goto free_irq; + } + + psp_conf.fw_size = fw->size; + psp_conf.fw_buf = fw->data; + for (i = 0; i < PSP_MAX_REGS; i++) + psp_conf.psp_regs[i] = tbl[PSP_REG_BAR(ndev, i)] + PSP_REG_OFF(ndev, i); + ndev->psp_hdl = aie2m_psp_create(&xdna->ddev, &psp_conf); + if (!ndev->psp_hdl) { + XDNA_ERR(xdna, "failed to create psp"); + ret = -ENOMEM; + goto disable_sva; + } + xdna->dev_handle = ndev; + + ret = aie2_hw_start(xdna); + if (ret) { + XDNA_ERR(xdna, "start npu failed, ret %d", ret); + goto disable_sva; + } + + ret = aie2_mgmt_fw_query(ndev); + if (ret) { + XDNA_ERR(xdna, "Query firmware failed, ret %d", ret); + goto stop_hw; + } + ndev->total_col = min(aie2_max_col, ndev->metadata.cols); + + xrs_cfg.clk_list.num_levels = 3; + xrs_cfg.clk_list.cu_clk_list[0] = 0; + xrs_cfg.clk_list.cu_clk_list[1] = 800; + xrs_cfg.clk_list.cu_clk_list[2] = 1000; + xrs_cfg.sys_eff_factor = 1; + xrs_cfg.ddev = &xdna->ddev; + xrs_cfg.actions = &aie2_xrs_actions; + xrs_cfg.total_col = ndev->total_col; + + xdna->xrs_hdl = xrsm_init(&xrs_cfg); + if (!xdna->xrs_hdl) { + XDNA_ERR(xdna, "Initialize resolver failed"); + ret = -EINVAL; + goto stop_hw; + } + + ret = aie2_error_async_events_alloc(ndev); + if (ret) { + XDNA_ERR(xdna, "Allocate async events failed, ret %d", ret); + goto stop_hw; + } + + ret = aie2_error_async_events_send(ndev); + if (ret) { + XDNA_ERR(xdna, "Send async events failed, ret %d", ret); + goto async_event_free; + } + + /* Issue a command to make sure firmware handled async events */ + ret = aie2_query_firmware_version(ndev, &ndev->xdna->fw_ver); + if (ret) { + XDNA_ERR(xdna, "Re-query firmware version failed"); + goto async_event_free; + } + + release_firmware(fw); + return 0; + +async_event_free: + aie2_error_async_events_free(ndev); +stop_hw: + aie2_hw_stop(xdna); +disable_sva: + iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); +free_irq: + pci_free_irq_vectors(pdev); +release_fw: + release_firmware(fw); + + return ret; +} + +static void aie2_fini(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev = xdna->dev_handle; + + aie2_hw_stop(xdna); + aie2_error_async_events_free(ndev); + iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); + pci_free_irq_vectors(pdev); +} + +static int aie2_get_aie_status(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_status status; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret; + + ndev = xdna->dev_handle; + if (copy_from_user(&status, u64_to_user_ptr(args->buffer), sizeof(status))) { + XDNA_ERR(xdna, "Failed to copy AIE request into kernel"); + return -EFAULT; + } + + if (ndev->metadata.cols * ndev->metadata.size < status.buffer_size) { + XDNA_ERR(xdna, "Invalid buffer size. Given Size: %u. Need Size: %u.", + status.buffer_size, ndev->metadata.cols * ndev->metadata.size); + return -EINVAL; + } + + ret = aie2_query_status(ndev, u64_to_user_ptr(status.buffer), + status.buffer_size, &status.cols_filled); + if (ret) { + XDNA_ERR(xdna, "Failed to get AIE status info. Ret: %d", ret); + return ret; + } + + if (copy_to_user(u64_to_user_ptr(args->buffer), &status, sizeof(status))) { + XDNA_ERR(xdna, "Failed to copy AIE request info to user space"); + return -EFAULT; + } + + return 0; +} + +static int aie2_get_aie_metadata(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_metadata *meta; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret = 0; + + ndev = xdna->dev_handle; + meta = kzalloc(sizeof(*meta), GFP_KERNEL); + if (!meta) + return -ENOMEM; + + meta->col_size = ndev->metadata.size; + meta->cols = ndev->metadata.cols; + meta->rows = ndev->metadata.rows; + + meta->version.major = ndev->metadata.version.major; + meta->version.minor = ndev->metadata.version.minor; + + meta->core.row_count = ndev->metadata.core.row_count; + meta->core.row_start = ndev->metadata.core.row_start; + meta->core.dma_channel_count = ndev->metadata.core.dma_channel_count; + meta->core.lock_count = ndev->metadata.core.lock_count; + meta->core.event_reg_count = ndev->metadata.core.event_reg_count; + + meta->mem.row_count = ndev->metadata.mem.row_count; + meta->mem.row_start = ndev->metadata.mem.row_start; + meta->mem.dma_channel_count = ndev->metadata.mem.dma_channel_count; + meta->mem.lock_count = ndev->metadata.mem.lock_count; + meta->mem.event_reg_count = ndev->metadata.mem.event_reg_count; + + meta->shim.row_count = ndev->metadata.shim.row_count; + meta->shim.row_start = ndev->metadata.shim.row_start; + meta->shim.dma_channel_count = ndev->metadata.shim.dma_channel_count; + meta->shim.lock_count = ndev->metadata.shim.lock_count; + meta->shim.event_reg_count = ndev->metadata.shim.event_reg_count; + + if (copy_to_user(u64_to_user_ptr(args->buffer), meta, sizeof(*meta))) + ret = -EFAULT; + + kfree(meta); + return ret; +} + +static int aie2_get_aie_version(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_version version; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + + ndev = xdna->dev_handle; + version.major = ndev->version.major; + version.minor = ndev->version.minor; + + if (copy_to_user(u64_to_user_ptr(args->buffer), &version, sizeof(version))) + return -EFAULT; + + return 0; +} + +static int aie2_get_clock_metadata(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_clock_metadata *clock; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret = 0; + + ndev = xdna->dev_handle; + clock = kzalloc(sizeof(*clock), GFP_KERNEL); + if (!clock) + return -ENOMEM; + + memcpy(clock->mp_npu_clock.name, ndev->mp_npu_clock.name, + sizeof(clock->mp_npu_clock.name)); + clock->mp_npu_clock.freq_mhz = ndev->mp_npu_clock.freq_mhz; + memcpy(clock->h_clock.name, ndev->h_clock.name, sizeof(clock->h_clock.name)); + clock->h_clock.freq_mhz = ndev->h_clock.freq_mhz; + + if (copy_to_user(u64_to_user_ptr(args->buffer), clock, sizeof(*clock))) + ret = -EFAULT; + + kfree(clock); + return ret; +} + +static int aie2_get_hwctx_status(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_hwctx __user *buf; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_drm_query_hwctx *tmp; + struct amdxdna_client *tmp_client; + struct amdxdna_hwctx *hwctx; + bool overflow = false; + u32 req_bytes = 0; + u32 hw_i = 0; + int ret = 0; + int next; + int idx; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + + tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + buf = u64_to_user_ptr(args->buffer); + list_for_each_entry(tmp_client, &xdna->client_list, node) { + idx = srcu_read_lock(&tmp_client->hwctx_srcu); + next = 0; + idr_for_each_entry_continue(&tmp_client->hwctx_idr, hwctx, next) { + req_bytes += sizeof(*tmp); + if (args->buffer_size < req_bytes) { + /* Continue iterating to get the required size */ + overflow = true; + continue; + } + + memset(tmp, 0, sizeof(*tmp)); + tmp->pid = tmp_client->pid; + tmp->context_id = hwctx->id; + tmp->start_col = hwctx->start_col; + tmp->num_col = hwctx->num_col; + tmp->command_submissions = hwctx->priv->seq; + tmp->command_completions = hwctx->priv->completed; + + if (copy_to_user(&buf[hw_i], tmp, sizeof(*tmp))) { + ret = -EFAULT; + srcu_read_unlock(&tmp_client->hwctx_srcu, idx); + goto out; + } + hw_i++; + } + srcu_read_unlock(&tmp_client->hwctx_srcu, idx); + } + + if (overflow) { + XDNA_ERR(xdna, "Invalid buffer size. Given: %u Need: %u.", + args->buffer_size, req_bytes); + ret = -EINVAL; + } + +out: + kfree(tmp); + args->buffer_size = req_bytes; + return ret; +} + +static int aie2_get_info(struct amdxdna_client *client, struct amdxdna_drm_get_info *args) +{ + struct amdxdna_dev *xdna = client->xdna; + int ret, idx; + + if (!drm_dev_enter(&xdna->ddev, &idx)) + return -ENODEV; + + switch (args->param) { + case DRM_AMDXDNA_QUERY_AIE_STATUS: + ret = aie2_get_aie_status(client, args); + break; + case DRM_AMDXDNA_QUERY_AIE_METADATA: + ret = aie2_get_aie_metadata(client, args); + break; + case DRM_AMDXDNA_QUERY_AIE_VERSION: + ret = aie2_get_aie_version(client, args); + break; + case DRM_AMDXDNA_QUERY_CLOCK_METADATA: + ret = aie2_get_clock_metadata(client, args); + break; + case DRM_AMDXDNA_QUERY_HW_CONTEXTS: + ret = aie2_get_hwctx_status(client, args); + break; + default: + XDNA_ERR(xdna, "Not supported request parameter %u", args->param); + ret = -EOPNOTSUPP; + } + XDNA_DBG(xdna, "Got param %d", args->param); + + drm_dev_exit(idx); + return ret; +} + +const struct amdxdna_dev_ops aie2_ops = { + .init = aie2_init, + .fini = aie2_fini, + .resume = aie2_hw_start, + .suspend = aie2_hw_stop, + .get_aie_info = aie2_get_info, + .hwctx_init = aie2_hwctx_init, + .hwctx_fini = aie2_hwctx_fini, + .hwctx_config = aie2_hwctx_config, + .cmd_submit = aie2_cmd_submit, + .hmm_invalidate = aie2_hmm_invalidate, + .hwctx_suspend = aie2_hwctx_suspend, + .hwctx_resume = aie2_hwctx_resume, +}; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h new file mode 100644 index 000000000000..6a2686255c9c --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -0,0 +1,259 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_PCI_H_ +#define _AIE2_PCI_H_ + +#include <linux/semaphore.h> + +#include "amdxdna_mailbox.h" + +#define AIE2_INTERVAL 20000 /* us */ +#define AIE2_TIMEOUT 1000000 /* us */ + +/* Firmware determines device memory base address and size */ +#define AIE2_DEVM_BASE 0x4000000 +#define AIE2_DEVM_SIZE SZ_64M + +#define NDEV2PDEV(ndev) (to_pci_dev((ndev)->xdna->ddev.dev)) + +#define AIE2_SRAM_OFF(ndev, addr) ((addr) - (ndev)->priv->sram_dev_addr) +#define AIE2_MBOX_OFF(ndev, addr) ((addr) - (ndev)->priv->mbox_dev_addr) + +#define PSP_REG_BAR(ndev, idx) ((ndev)->priv->psp_regs_off[(idx)].bar_idx) +#define PSP_REG_OFF(ndev, idx) ((ndev)->priv->psp_regs_off[(idx)].offset) +#define SRAM_REG_OFF(ndev, idx) ((ndev)->priv->sram_offs[(idx)].offset) + +#define SMU_REG(ndev, idx) \ +({ \ + typeof(ndev) _ndev = ndev; \ + ((_ndev)->smu_base + (_ndev)->priv->smu_regs_off[(idx)].offset); \ +}) +#define SRAM_GET_ADDR(ndev, idx) \ +({ \ + typeof(ndev) _ndev = ndev; \ + ((_ndev)->sram_base + SRAM_REG_OFF((_ndev), (idx))); \ +}) + +#define CHAN_SLOT_SZ SZ_8K +#define CHANN_INDEX(ndev, rbuf_off) \ + (((rbuf_off) - SRAM_REG_OFF((ndev), MBOX_CHANN_OFF)) / CHAN_SLOT_SZ) + +#define MBOX_SIZE(ndev) \ +({ \ + typeof(ndev) _ndev = (ndev); \ + ((_ndev)->priv->mbox_size) ? (_ndev)->priv->mbox_size : \ + pci_resource_len(NDEV2PDEV(_ndev), (_ndev)->xdna->dev_info->mbox_bar); \ +}) + +#define SMU_MPNPUCLK_FREQ_MAX(ndev) ((ndev)->priv->smu_mpnpuclk_freq_max) +#define SMU_HCLK_FREQ_MAX(ndev) ((ndev)->priv->smu_hclk_freq_max) + +enum aie2_smu_reg_idx { + SMU_CMD_REG = 0, + SMU_ARG_REG, + SMU_INTR_REG, + SMU_RESP_REG, + SMU_OUT_REG, + SMU_MAX_REGS /* Keep this at the end */ +}; + +enum aie2_sram_reg_idx { + MBOX_CHANN_OFF = 0, + FW_ALIVE_OFF, + SRAM_MAX_INDEX /* Keep this at the end */ +}; + +enum psp_reg_idx { + PSP_CMD_REG = 0, + PSP_ARG0_REG, + PSP_ARG1_REG, + PSP_ARG2_REG, + PSP_NUM_IN_REGS, /* number of input registers */ + PSP_INTR_REG = PSP_NUM_IN_REGS, + PSP_STATUS_REG, + PSP_RESP_REG, + PSP_MAX_REGS /* Keep this at the end */ +}; + +struct amdxdna_client; +struct amdxdna_fw_ver; +struct amdxdna_hwctx; +struct amdxdna_sched_job; + +struct psp_config { + const void *fw_buf; + u32 fw_size; + void __iomem *psp_regs[PSP_MAX_REGS]; +}; + +struct aie_version { + u16 major; + u16 minor; +}; + +struct aie_tile_metadata { + u16 row_count; + u16 row_start; + u16 dma_channel_count; + u16 lock_count; + u16 event_reg_count; +}; + +struct aie_metadata { + u32 size; + u16 cols; + u16 rows; + struct aie_version version; + struct aie_tile_metadata core; + struct aie_tile_metadata mem; + struct aie_tile_metadata shim; +}; + +struct clock_entry { + char name[16]; + u32 freq_mhz; +}; + +struct rt_config { + u32 type; + u32 value; +}; + +/* + * Define the maximum number of pending commands in a hardware context. + * Must be power of 2! + */ +#define HWCTX_MAX_CMDS 4 +#define get_job_idx(seq) ((seq) & (HWCTX_MAX_CMDS - 1)) +struct amdxdna_hwctx_priv { + struct amdxdna_gem_obj *heap; + void *mbox_chann; + + struct drm_gpu_scheduler sched; + struct drm_sched_entity entity; + + struct mutex io_lock; /* protect seq and cmd order */ + struct wait_queue_head job_free_wq; + u32 num_pending; + u64 seq; + struct semaphore job_sem; + bool job_done; + + /* Completed job counter */ + u64 completed; + + struct amdxdna_gem_obj *cmd_buf[HWCTX_MAX_CMDS]; + struct drm_syncobj *syncobj; +}; + +struct amdxdna_dev_hdl { + struct amdxdna_dev *xdna; + const struct amdxdna_dev_priv *priv; + void __iomem *sram_base; + void __iomem *smu_base; + void __iomem *mbox_base; + struct psp_device *psp_hdl; + + struct xdna_mailbox_chann_res mgmt_x2i; + struct xdna_mailbox_chann_res mgmt_i2x; + u32 mgmt_chan_idx; + + u32 total_col; + struct aie_version version; + struct aie_metadata metadata; + struct clock_entry mp_npu_clock; + struct clock_entry h_clock; + + /* Mailbox and the management channel */ + struct mailbox *mbox; + struct mailbox_channel *mgmt_chann; + struct async_events *async_events; +}; + +#define DEFINE_BAR_OFFSET(reg_name, bar, reg_addr) \ + [reg_name] = {bar##_BAR_INDEX, (reg_addr) - bar##_BAR_BASE} + +struct aie2_bar_off_pair { + int bar_idx; + u32 offset; +}; + +struct amdxdna_dev_priv { + const char *fw_path; + u64 protocol_major; + u64 protocol_minor; + struct rt_config rt_config; +#define COL_ALIGN_NONE 0 +#define COL_ALIGN_NATURE 1 + u32 col_align; + u32 mbox_dev_addr; + /* If mbox_size is 0, use BAR size. See MBOX_SIZE macro */ + u32 mbox_size; + u32 sram_dev_addr; + struct aie2_bar_off_pair sram_offs[SRAM_MAX_INDEX]; + struct aie2_bar_off_pair psp_regs_off[PSP_MAX_REGS]; + struct aie2_bar_off_pair smu_regs_off[SMU_MAX_REGS]; + u32 smu_mpnpuclk_freq_max; + u32 smu_hclk_freq_max; +}; + +extern const struct amdxdna_dev_ops aie2_ops; + +/* aie2_smu.c */ +int aie2_smu_init(struct amdxdna_dev_hdl *ndev); +void aie2_smu_fini(struct amdxdna_dev_hdl *ndev); + +/* aie2_psp.c */ +struct psp_device *aie2m_psp_create(struct drm_device *ddev, struct psp_config *conf); +int aie2_psp_start(struct psp_device *psp); +void aie2_psp_stop(struct psp_device *psp); + +/* aie2_error.c */ +int aie2_error_async_events_alloc(struct amdxdna_dev_hdl *ndev); +void aie2_error_async_events_free(struct amdxdna_dev_hdl *ndev); +int aie2_error_async_events_send(struct amdxdna_dev_hdl *ndev); +int aie2_error_async_msg_thread(void *data); + +/* aie2_message.c */ +int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev); +int aie2_resume_fw(struct amdxdna_dev_hdl *ndev); +int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value); +int aie2_get_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 *value); +int aie2_check_protocol_version(struct amdxdna_dev_hdl *ndev); +int aie2_assign_mgmt_pasid(struct amdxdna_dev_hdl *ndev, u16 pasid); +int aie2_query_aie_version(struct amdxdna_dev_hdl *ndev, struct aie_version *version); +int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metadata *metadata); +int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, + struct amdxdna_fw_ver *fw_ver); +int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx); +int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx); +int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 addr, u64 size); +int aie2_query_status(struct amdxdna_dev_hdl *ndev, char *buf, u32 size, u32 *cols_filled); +int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t addr, u32 size, + void *handle, int (*cb)(void*, const u32 *, size_t)); +int aie2_config_cu(struct amdxdna_hwctx *hwctx); +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); + +/* aie2_hwctx.c */ +int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); +void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx); +int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, void *buf, u32 size); +void aie2_hwctx_suspend(struct amdxdna_hwctx *hwctx); +void aie2_hwctx_resume(struct amdxdna_hwctx *hwctx); +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, u64 *seq); +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, unsigned long cur_seq); +void aie2_restart_ctx(struct amdxdna_client *client); + +#endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/aie2_psp.c b/drivers/accel/amdxdna/aie2_psp.c new file mode 100644 index 000000000000..dc3a072ce3b6 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_psp.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/drm_device.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_managed.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <linux/bitfield.h> +#include <linux/iopoll.h> + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +#define PSP_STATUS_READY BIT(31) + +/* PSP commands */ +#define PSP_VALIDATE 1 +#define PSP_START 2 +#define PSP_RELEASE_TMR 3 + +/* PSP special arguments */ +#define PSP_START_COPY_FW 1 + +/* PSP response error code */ +#define PSP_ERROR_CANCEL 0xFFFF0002 +#define PSP_ERROR_BAD_STATE 0xFFFF0007 + +#define PSP_FW_ALIGN 0x10000 +#define PSP_POLL_INTERVAL 20000 /* us */ +#define PSP_POLL_TIMEOUT 1000000 /* us */ + +#define PSP_REG(p, reg) ((p)->psp_regs[reg]) + +struct psp_device { + struct drm_device *ddev; + struct psp_config conf; + u32 fw_buf_sz; + u64 fw_paddr; + void *fw_buffer; + void __iomem *psp_regs[PSP_MAX_REGS]; +}; + +static int psp_exec(struct psp_device *psp, u32 *reg_vals) +{ + u32 resp_code; + int ret, i; + u32 ready; + + /* Write command and argument registers */ + for (i = 0; i < PSP_NUM_IN_REGS; i++) + writel(reg_vals[i], PSP_REG(psp, i)); + + /* clear and set PSP INTR register to kick off */ + writel(0, PSP_REG(psp, PSP_INTR_REG)); + writel(1, PSP_REG(psp, PSP_INTR_REG)); + + /* PSP should be busy. Wait for ready, so we know task is done. */ + ret = readx_poll_timeout(readl, PSP_REG(psp, PSP_STATUS_REG), ready, + FIELD_GET(PSP_STATUS_READY, ready), + PSP_POLL_INTERVAL, PSP_POLL_TIMEOUT); + if (ret) { + drm_err(psp->ddev, "PSP is not ready, ret 0x%x", ret); + return ret; + } + + resp_code = readl(PSP_REG(psp, PSP_RESP_REG)); + if (resp_code) { + drm_err(psp->ddev, "fw return error 0x%x", resp_code); + return -EIO; + } + + return 0; +} + +void aie2_psp_stop(struct psp_device *psp) +{ + u32 reg_vals[PSP_NUM_IN_REGS] = { PSP_RELEASE_TMR, }; + int ret; + + ret = psp_exec(psp, reg_vals); + if (ret) + drm_err(psp->ddev, "release tmr failed, ret %d", ret); +} + +int aie2_psp_start(struct psp_device *psp) +{ + u32 reg_vals[PSP_NUM_IN_REGS]; + int ret; + + reg_vals[0] = PSP_VALIDATE; + reg_vals[1] = lower_32_bits(psp->fw_paddr); + reg_vals[2] = upper_32_bits(psp->fw_paddr); + reg_vals[3] = psp->fw_buf_sz; + + ret = psp_exec(psp, reg_vals); + if (ret) { + drm_err(psp->ddev, "failed to validate fw, ret %d", ret); + return ret; + } + + memset(reg_vals, 0, sizeof(reg_vals)); + reg_vals[0] = PSP_START; + reg_vals[1] = PSP_START_COPY_FW; + ret = psp_exec(psp, reg_vals); + if (ret) { + drm_err(psp->ddev, "failed to start fw, ret %d", ret); + return ret; + } + + return 0; +} + +struct psp_device *aie2m_psp_create(struct drm_device *ddev, struct psp_config *conf) +{ + struct psp_device *psp; + u64 offset; + + psp = drmm_kzalloc(ddev, sizeof(*psp), GFP_KERNEL); + if (!psp) + return NULL; + + psp->ddev = ddev; + memcpy(psp->psp_regs, conf->psp_regs, sizeof(psp->psp_regs)); + + psp->fw_buf_sz = ALIGN(conf->fw_size, PSP_FW_ALIGN) + PSP_FW_ALIGN; + psp->fw_buffer = drmm_kmalloc(ddev, psp->fw_buf_sz, GFP_KERNEL); + if (!psp->fw_buffer) { + drm_err(ddev, "no memory for fw buffer"); + return NULL; + } + + /* + * AMD Platform Security Processor(PSP) requires host physical + * address to load NPU firmware. + */ + psp->fw_paddr = virt_to_phys(psp->fw_buffer); + offset = ALIGN(psp->fw_paddr, PSP_FW_ALIGN) - psp->fw_paddr; + psp->fw_paddr += offset; + memcpy(psp->fw_buffer + offset, conf->fw_buf, conf->fw_size); + + return psp; +} diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_smu.c new file mode 100644 index 000000000000..91893d438da7 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_smu.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/drm_device.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <linux/iopoll.h> + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +#define SMU_RESULT_OK 1 + +/* SMU commands */ +#define AIE2_SMU_POWER_ON 0x3 +#define AIE2_SMU_POWER_OFF 0x4 +#define AIE2_SMU_SET_MPNPUCLK_FREQ 0x5 +#define AIE2_SMU_SET_HCLK_FREQ 0x6 + +static int aie2_smu_exec(struct amdxdna_dev_hdl *ndev, u32 reg_cmd, u32 reg_arg) +{ + u32 resp; + int ret; + + writel(0, SMU_REG(ndev, SMU_RESP_REG)); + writel(reg_arg, SMU_REG(ndev, SMU_ARG_REG)); + writel(reg_cmd, SMU_REG(ndev, SMU_CMD_REG)); + + /* Clear and set SMU_INTR_REG to kick off */ + writel(0, SMU_REG(ndev, SMU_INTR_REG)); + writel(1, SMU_REG(ndev, SMU_INTR_REG)); + + ret = readx_poll_timeout(readl, SMU_REG(ndev, SMU_RESP_REG), resp, + resp, AIE2_INTERVAL, AIE2_TIMEOUT); + if (ret) { + XDNA_ERR(ndev->xdna, "smu cmd %d timed out", reg_cmd); + return ret; + } + + if (resp != SMU_RESULT_OK) { + XDNA_ERR(ndev->xdna, "smu cmd %d failed, 0x%x", reg_cmd, resp); + return -EINVAL; + } + + return 0; +} + +static int aie2_smu_set_mpnpu_clock_freq(struct amdxdna_dev_hdl *ndev, u32 freq_mhz) +{ + int ret; + + if (!freq_mhz || freq_mhz > SMU_MPNPUCLK_FREQ_MAX(ndev)) { + XDNA_ERR(ndev->xdna, "invalid mpnpu clock freq %d", freq_mhz); + return -EINVAL; + } + + ndev->mp_npu_clock.freq_mhz = freq_mhz; + ret = aie2_smu_exec(ndev, AIE2_SMU_SET_MPNPUCLK_FREQ, freq_mhz); + if (!ret) + XDNA_INFO_ONCE(ndev->xdna, "set mpnpu_clock = %d mhz", freq_mhz); + + return ret; +} + +static int aie2_smu_set_hclock_freq(struct amdxdna_dev_hdl *ndev, u32 freq_mhz) +{ + int ret; + + if (!freq_mhz || freq_mhz > SMU_HCLK_FREQ_MAX(ndev)) { + XDNA_ERR(ndev->xdna, "invalid hclock freq %d", freq_mhz); + return -EINVAL; + } + + ndev->h_clock.freq_mhz = freq_mhz; + ret = aie2_smu_exec(ndev, AIE2_SMU_SET_HCLK_FREQ, freq_mhz); + if (!ret) + XDNA_INFO_ONCE(ndev->xdna, "set npu_hclock = %d mhz", freq_mhz); + + return ret; +} + +int aie2_smu_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0); + if (ret) { + XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret); + return ret; + } + + ret = aie2_smu_set_mpnpu_clock_freq(ndev, SMU_MPNPUCLK_FREQ_MAX(ndev)); + if (ret) { + XDNA_ERR(ndev->xdna, "Set mpnpu clk freq failed, ret %d", ret); + return ret; + } + snprintf(ndev->mp_npu_clock.name, sizeof(ndev->mp_npu_clock.name), "MP-NPU Clock"); + + ret = aie2_smu_set_hclock_freq(ndev, SMU_HCLK_FREQ_MAX(ndev)); + if (ret) { + XDNA_ERR(ndev->xdna, "Set hclk freq failed, ret %d", ret); + return ret; + } + snprintf(ndev->h_clock.name, sizeof(ndev->h_clock.name), "H Clock"); + + return 0; +} + +void aie2_smu_fini(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0); + if (ret) + XDNA_ERR(ndev->xdna, "Power off failed, ret %d", ret); +} diff --git a/drivers/accel/amdxdna/aie2_solver.c b/drivers/accel/amdxdna/aie2_solver.c new file mode 100644 index 000000000000..a537c66589a4 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_solver.c @@ -0,0 +1,330 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/drm_device.h> +#include <drm/drm_managed.h> +#include <drm/drm_print.h> +#include <linux/bitops.h> +#include <linux/bitmap.h> + +#include "aie2_solver.h" + +struct partition_node { + struct list_head list; + u32 nshared; /* # shared requests */ + u32 start_col; /* start column */ + u32 ncols; /* # columns */ + bool exclusive; /* can not be shared if set */ +}; + +struct solver_node { + struct list_head list; + u64 rid; /* Request ID from consumer */ + + struct partition_node *pt_node; + void *cb_arg; + u32 cols_len; + u32 start_cols[] __counted_by(cols_len); +}; + +struct solver_rgroup { + u32 rgid; + u32 nnode; + u32 npartition_node; + + DECLARE_BITMAP(resbit, XRS_MAX_COL); + struct list_head node_list; + struct list_head pt_node_list; +}; + +struct solver_state { + struct solver_rgroup rgp; + struct init_config cfg; + struct xrs_action_ops *actions; +}; + +static u32 calculate_gops(struct aie_qos *rqos) +{ + u32 service_rate = 0; + + if (rqos->latency) + service_rate = (1000 / rqos->latency); + + if (rqos->fps > service_rate) + return rqos->fps * rqos->gops; + + return service_rate * rqos->gops; +} + +/* + * qos_meet() - Check the QOS request can be met. + */ +static int qos_meet(struct solver_state *xrs, struct aie_qos *rqos, u32 cgops) +{ + u32 request_gops = calculate_gops(rqos) * xrs->cfg.sys_eff_factor; + + if (request_gops <= cgops) + return 0; + + return -EINVAL; +} + +/* + * sanity_check() - Do a basic sanity check on allocation request. + */ +static int sanity_check(struct solver_state *xrs, struct alloc_requests *req) +{ + struct cdo_parts *cdop = &req->cdo; + struct aie_qos *rqos = &req->rqos; + u32 cu_clk_freq; + + if (cdop->ncols > xrs->cfg.total_col) + return -EINVAL; + + /* + * We can find at least one CDOs groups that meet the + * GOPs requirement. + */ + cu_clk_freq = xrs->cfg.clk_list.cu_clk_list[xrs->cfg.clk_list.num_levels - 1]; + + if (qos_meet(xrs, rqos, cdop->qos_cap.opc * cu_clk_freq / 1000)) + return -EINVAL; + + return 0; +} + +static struct solver_node *rg_search_node(struct solver_rgroup *rgp, u64 rid) +{ + struct solver_node *node; + + list_for_each_entry(node, &rgp->node_list, list) { + if (node->rid == rid) + return node; + } + + return NULL; +} + +static void remove_partition_node(struct solver_rgroup *rgp, + struct partition_node *pt_node) +{ + pt_node->nshared--; + if (pt_node->nshared > 0) + return; + + list_del(&pt_node->list); + rgp->npartition_node--; + + bitmap_clear(rgp->resbit, pt_node->start_col, pt_node->ncols); + kfree(pt_node); +} + +static void remove_solver_node(struct solver_rgroup *rgp, + struct solver_node *node) +{ + list_del(&node->list); + rgp->nnode--; + + if (node->pt_node) + remove_partition_node(rgp, node->pt_node); + + kfree(node); +} + +static int get_free_partition(struct solver_state *xrs, + struct solver_node *snode, + struct alloc_requests *req) +{ + struct partition_node *pt_node; + u32 ncols = req->cdo.ncols; + u32 col, i; + + for (i = 0; i < snode->cols_len; i++) { + col = snode->start_cols[i]; + if (find_next_bit(xrs->rgp.resbit, XRS_MAX_COL, col) >= col + ncols) + break; + } + + if (i == snode->cols_len) + return -ENODEV; + + pt_node = kzalloc(sizeof(*pt_node), GFP_KERNEL); + if (!pt_node) + return -ENOMEM; + + pt_node->nshared = 1; + pt_node->start_col = col; + pt_node->ncols = ncols; + + /* + * Before fully support latency in QoS, if a request + * specifies a non-zero latency value, it will not share + * the partition with other requests. + */ + if (req->rqos.latency) + pt_node->exclusive = true; + + list_add_tail(&pt_node->list, &xrs->rgp.pt_node_list); + xrs->rgp.npartition_node++; + bitmap_set(xrs->rgp.resbit, pt_node->start_col, pt_node->ncols); + + snode->pt_node = pt_node; + + return 0; +} + +static int allocate_partition(struct solver_state *xrs, + struct solver_node *snode, + struct alloc_requests *req) +{ + struct partition_node *pt_node, *rpt_node = NULL; + int idx, ret; + + ret = get_free_partition(xrs, snode, req); + if (!ret) + return ret; + + /* try to get a share-able partition */ + list_for_each_entry(pt_node, &xrs->rgp.pt_node_list, list) { + if (pt_node->exclusive) + continue; + + if (rpt_node && pt_node->nshared >= rpt_node->nshared) + continue; + + for (idx = 0; idx < snode->cols_len; idx++) { + if (snode->start_cols[idx] != pt_node->start_col) + continue; + + if (req->cdo.ncols != pt_node->ncols) + continue; + + rpt_node = pt_node; + break; + } + } + + if (!rpt_node) + return -ENODEV; + + rpt_node->nshared++; + snode->pt_node = rpt_node; + + return 0; +} + +static struct solver_node *create_solver_node(struct solver_state *xrs, + struct alloc_requests *req) +{ + struct cdo_parts *cdop = &req->cdo; + struct solver_node *node; + int ret; + + node = kzalloc(struct_size(node, start_cols, cdop->cols_len), GFP_KERNEL); + if (!node) + return ERR_PTR(-ENOMEM); + + node->rid = req->rid; + node->cols_len = cdop->cols_len; + memcpy(node->start_cols, cdop->start_cols, cdop->cols_len * sizeof(u32)); + + ret = allocate_partition(xrs, node, req); + if (ret) + goto free_node; + + list_add_tail(&node->list, &xrs->rgp.node_list); + xrs->rgp.nnode++; + return node; + +free_node: + kfree(node); + return ERR_PTR(ret); +} + +static void fill_load_action(struct solver_state *xrs, + struct solver_node *snode, + struct xrs_action_load *action) +{ + action->rid = snode->rid; + action->part.start_col = snode->pt_node->start_col; + action->part.ncols = snode->pt_node->ncols; +} + +int xrs_allocate_resource(void *hdl, struct alloc_requests *req, void *cb_arg) +{ + struct xrs_action_load load_act; + struct solver_node *snode; + struct solver_state *xrs; + int ret; + + xrs = (struct solver_state *)hdl; + + ret = sanity_check(xrs, req); + if (ret) { + drm_err(xrs->cfg.ddev, "invalid request"); + return ret; + } + + if (rg_search_node(&xrs->rgp, req->rid)) { + drm_err(xrs->cfg.ddev, "rid %lld is in-use", req->rid); + return -EEXIST; + } + + snode = create_solver_node(xrs, req); + if (IS_ERR(snode)) + return PTR_ERR(snode); + + fill_load_action(xrs, snode, &load_act); + ret = xrs->cfg.actions->load(cb_arg, &load_act); + if (ret) + goto free_node; + + snode->cb_arg = cb_arg; + + drm_dbg(xrs->cfg.ddev, "start col %d ncols %d\n", + snode->pt_node->start_col, snode->pt_node->ncols); + + return 0; + +free_node: + remove_solver_node(&xrs->rgp, snode); + + return ret; +} + +int xrs_release_resource(void *hdl, u64 rid) +{ + struct solver_state *xrs = hdl; + struct solver_node *node; + + node = rg_search_node(&xrs->rgp, rid); + if (!node) { + drm_err(xrs->cfg.ddev, "node not exist"); + return -ENODEV; + } + + xrs->cfg.actions->unload(node->cb_arg); + remove_solver_node(&xrs->rgp, node); + + return 0; +} + +void *xrsm_init(struct init_config *cfg) +{ + struct solver_rgroup *rgp; + struct solver_state *xrs; + + xrs = drmm_kzalloc(cfg->ddev, sizeof(*xrs), GFP_KERNEL); + if (!xrs) + return NULL; + + memcpy(&xrs->cfg, cfg, sizeof(*cfg)); + + rgp = &xrs->rgp; + INIT_LIST_HEAD(&rgp->node_list); + INIT_LIST_HEAD(&rgp->pt_node_list); + + return xrs; +} diff --git a/drivers/accel/amdxdna/aie2_solver.h b/drivers/accel/amdxdna/aie2_solver.h new file mode 100644 index 000000000000..9b1847bb46a6 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_solver.h @@ -0,0 +1,154 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_SOLVER_H +#define _AIE2_SOLVER_H + +#define XRS_MAX_COL 128 + +/* + * Structure used to describe a partition. A partition is column based + * allocation unit described by its start column and number of columns. + */ +struct aie_part { + u32 start_col; + u32 ncols; +}; + +/* + * The QoS capabilities of a given AIE partition. + */ +struct aie_qos_cap { + u32 opc; /* operations per cycle */ + u32 dma_bw; /* DMA bandwidth */ +}; + +/* + * QoS requirement of a resource allocation. + */ +struct aie_qos { + u32 gops; /* Giga operations */ + u32 fps; /* Frames per second */ + u32 dma_bw; /* DMA bandwidth */ + u32 latency; /* Frame response latency */ + u32 exec_time; /* Frame execution time */ + u32 priority; /* Request priority */ +}; + +/* + * Structure used to describe a relocatable CDO (Configuration Data Object). + */ +struct cdo_parts { + u32 *start_cols; /* Start column array */ + u32 cols_len; /* Length of start column array */ + u32 ncols; /* # of column */ + struct aie_qos_cap qos_cap; /* CDO QoS capabilities */ +}; + +/* + * Structure used to describe a request to allocate. + */ +struct alloc_requests { + u64 rid; + struct cdo_parts cdo; + struct aie_qos rqos; /* Requested QoS */ +}; + +/* + * Load callback argument + */ +struct xrs_action_load { + u32 rid; + struct aie_part part; +}; + +/* + * Define the power level available + * + * POWER_LEVEL_MIN: + * Lowest power level. Usually set when all actions are unloaded. + * + * POWER_LEVEL_n + * Power levels 0 - n, is a step increase in system frequencies + */ +enum power_level { + POWER_LEVEL_MIN = 0x0, + POWER_LEVEL_0 = 0x1, + POWER_LEVEL_1 = 0x2, + POWER_LEVEL_2 = 0x3, + POWER_LEVEL_3 = 0x4, + POWER_LEVEL_4 = 0x5, + POWER_LEVEL_5 = 0x6, + POWER_LEVEL_6 = 0x7, + POWER_LEVEL_7 = 0x8, + POWER_LEVEL_NUM, +}; + +/* + * Structure used to describe the frequency table. + * Resource solver chooses the frequency from the table + * to meet the QOS requirements. + */ +struct clk_list_info { + u32 num_levels; /* available power levels */ + u32 cu_clk_list[POWER_LEVEL_NUM]; /* available aie clock frequencies in Mhz*/ +}; + +struct xrs_action_ops { + int (*load)(void *cb_arg, struct xrs_action_load *action); + int (*unload)(void *cb_arg); +}; + +/* + * Structure used to describe information for solver during initialization. + */ +struct init_config { + u32 total_col; + u32 sys_eff_factor; /* system efficiency factor */ + u32 latency_adj; /* latency adjustment in ms */ + struct clk_list_info clk_list; /* List of frequencies available in system */ + struct drm_device *ddev; + struct xrs_action_ops *actions; +}; + +/* + * xrsm_init() - Register resource solver. Resource solver client needs + * to call this function to register itself. + * + * @cfg: The system metrics for resource solver to use + * + * Return: A resource solver handle + * + * Note: We should only create one handle per AIE array to be managed. + */ +void *xrsm_init(struct init_config *cfg); + +/* + * xrs_allocate_resource() - Request to allocate resources for a given context + * and a partition metadata. (See struct part_meta) + * + * @hdl: Resource solver handle obtained from xrs_init() + * @req: Input to the Resource solver including request id + * and partition metadata. + * @cb_arg: callback argument pointer + * + * Return: 0 when successful. + * Or standard error number when failing + * + * Note: + * There is no lock mechanism inside resource solver. So it is + * the caller's responsibility to lock down XCLBINs and grab + * necessary lock. + */ +int xrs_allocate_resource(void *hdl, struct alloc_requests *req, void *cb_arg); + +/* + * xrs_release_resource() - Request to free resources for a given context. + * + * @hdl: Resource solver handle obtained from xrs_init() + * @rid: The Request ID to identify the requesting context + */ +int xrs_release_resource(void *hdl, u64 rid); +#endif /* _AIE2_SOLVER_H */ diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/amdxdna_ctx.c new file mode 100644 index 000000000000..5478b631b73f --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -0,0 +1,553 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <trace/events/amdxdna.h> + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define MAX_HWCTX_ID 255 +#define MAX_ARG_COUNT 4095 + +struct amdxdna_fence { + struct dma_fence base; + spinlock_t lock; /* for base */ + struct amdxdna_hwctx *hwctx; +}; + +static const char *amdxdna_fence_get_driver_name(struct dma_fence *fence) +{ + return KBUILD_MODNAME; +} + +static const char *amdxdna_fence_get_timeline_name(struct dma_fence *fence) +{ + struct amdxdna_fence *xdna_fence; + + xdna_fence = container_of(fence, struct amdxdna_fence, base); + + return xdna_fence->hwctx->name; +} + +static const struct dma_fence_ops fence_ops = { + .get_driver_name = amdxdna_fence_get_driver_name, + .get_timeline_name = amdxdna_fence_get_timeline_name, +}; + +static struct dma_fence *amdxdna_fence_create(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_fence *fence; + + fence = kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return NULL; + + fence->hwctx = hwctx; + spin_lock_init(&fence->lock); + dma_fence_init(&fence->base, &fence_ops, &fence->lock, hwctx->id, 0); + return &fence->base; +} + +void amdxdna_hwctx_suspend(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_hwctx *hwctx; + int next = 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) + xdna->dev_info->ops->hwctx_suspend(hwctx); + mutex_unlock(&client->hwctx_lock); +} + +void amdxdna_hwctx_resume(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_hwctx *hwctx; + int next = 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) + xdna->dev_info->ops->hwctx_resume(hwctx); + mutex_unlock(&client->hwctx_lock); +} + +static void amdxdna_hwctx_destroy_rcu(struct amdxdna_hwctx *hwctx, + struct srcu_struct *ss) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + synchronize_srcu(ss); + + /* At this point, user is not able to submit new commands */ + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->hwctx_fini(hwctx); + mutex_unlock(&xdna->dev_lock); + + kfree(hwctx->name); + kfree(hwctx); +} + +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + u32 num_masks, count; + + if (amdxdna_cmd_get_op(abo) == ERT_CMD_CHAIN) + num_masks = 0; + else + num_masks = 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + + if (size) { + count = FIELD_GET(AMDXDNA_CMD_COUNT, cmd->header); + if (unlikely(count <= num_masks)) { + *size = 0; + return NULL; + } + *size = (count - num_masks) * sizeof(u32); + } + return &cmd->data[num_masks]; +} + +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + u32 num_masks, i; + u32 *cu_mask; + + if (amdxdna_cmd_get_op(abo) == ERT_CMD_CHAIN) + return -1; + + num_masks = 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + cu_mask = cmd->data; + for (i = 0; i < num_masks; i++) { + if (cu_mask[i]) + return ffs(cu_mask[i]) - 1; + } + + return -1; +} + +/* + * This should be called in close() and remove(). DO NOT call in other syscalls. + * This guarantee that when hwctx and resources will be released, if user + * doesn't call amdxdna_drm_destroy_hwctx_ioctl. + */ +void amdxdna_hwctx_remove_all(struct amdxdna_client *client) +{ + struct amdxdna_hwctx *hwctx; + int next = 0; + + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + XDNA_DBG(client->xdna, "PID %d close HW context %d", + client->pid, hwctx->id); + idr_remove(&client->hwctx_idr, hwctx->id); + mutex_unlock(&client->hwctx_lock); + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); + mutex_lock(&client->hwctx_lock); + } + mutex_unlock(&client->hwctx_lock); +} + +int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_create_hwctx *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret, idx; + + if (args->ext || args->ext_flags) + return -EINVAL; + + if (!drm_dev_enter(dev, &idx)) + return -ENODEV; + + hwctx = kzalloc(sizeof(*hwctx), GFP_KERNEL); + if (!hwctx) { + ret = -ENOMEM; + goto exit; + } + + if (copy_from_user(&hwctx->qos, u64_to_user_ptr(args->qos_p), sizeof(hwctx->qos))) { + XDNA_ERR(xdna, "Access QoS info failed"); + ret = -EFAULT; + goto free_hwctx; + } + + hwctx->client = client; + hwctx->fw_ctx_id = -1; + hwctx->num_tiles = args->num_tiles; + hwctx->mem_size = args->mem_size; + hwctx->max_opc = args->max_opc; + mutex_lock(&client->hwctx_lock); + ret = idr_alloc_cyclic(&client->hwctx_idr, hwctx, 0, MAX_HWCTX_ID, GFP_KERNEL); + if (ret < 0) { + mutex_unlock(&client->hwctx_lock); + XDNA_ERR(xdna, "Allocate hwctx ID failed, ret %d", ret); + goto free_hwctx; + } + hwctx->id = ret; + mutex_unlock(&client->hwctx_lock); + + hwctx->name = kasprintf(GFP_KERNEL, "hwctx.%d.%d", client->pid, hwctx->id); + if (!hwctx->name) { + ret = -ENOMEM; + goto rm_id; + } + + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->hwctx_init(hwctx); + if (ret) { + mutex_unlock(&xdna->dev_lock); + XDNA_ERR(xdna, "Init hwctx failed, ret %d", ret); + goto free_name; + } + args->handle = hwctx->id; + args->syncobj_handle = hwctx->syncobj_hdl; + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "PID %d create HW context %d, ret %d", client->pid, args->handle, ret); + drm_dev_exit(idx); + return 0; + +free_name: + kfree(hwctx->name); +rm_id: + mutex_lock(&client->hwctx_lock); + idr_remove(&client->hwctx_idr, hwctx->id); + mutex_unlock(&client->hwctx_lock); +free_hwctx: + kfree(hwctx); +exit: + drm_dev_exit(idx); + return ret; +} + +int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_destroy_hwctx *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret = 0, idx; + + if (!drm_dev_enter(dev, &idx)) + return -ENODEV; + + /* + * Use hwctx_lock to achieve exclusion with other hwctx writers, + * SRCU to synchronize with exec/wait command ioctls. + * + * The pushed jobs are handled by DRM scheduler during destroy. + */ + mutex_lock(&client->hwctx_lock); + hwctx = idr_find(&client->hwctx_idr, args->handle); + if (!hwctx) { + mutex_unlock(&client->hwctx_lock); + ret = -EINVAL; + XDNA_DBG(xdna, "PID %d HW context %d not exist", + client->pid, args->handle); + goto out; + } + idr_remove(&client->hwctx_idr, hwctx->id); + mutex_unlock(&client->hwctx_lock); + + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); + + XDNA_DBG(xdna, "PID %d destroyed HW context %d", client->pid, args->handle); +out: + drm_dev_exit(idx); + return ret; +} + +int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_config_hwctx *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret, idx; + u32 buf_size; + void *buf; + u64 val; + + if (!xdna->dev_info->ops->hwctx_config) + return -EOPNOTSUPP; + + val = args->param_val; + buf_size = args->param_val_size; + + switch (args->param_type) { + case DRM_AMDXDNA_HWCTX_CONFIG_CU: + /* For those types that param_val is pointer */ + if (buf_size > PAGE_SIZE) { + XDNA_ERR(xdna, "Config CU param buffer too large"); + return -E2BIG; + } + + /* Hwctx needs to keep buf */ + buf = kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, u64_to_user_ptr(val), buf_size)) { + kfree(buf); + return -EFAULT; + } + + break; + case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: + case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: + /* For those types that param_val is a value */ + buf = NULL; + buf_size = 0; + break; + default: + XDNA_DBG(xdna, "Unknown HW context config type %d", args->param_type); + return -EINVAL; + } + + mutex_lock(&xdna->dev_lock); + idx = srcu_read_lock(&client->hwctx_srcu); + hwctx = idr_find(&client->hwctx_idr, args->handle); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", client->pid, args->handle); + ret = -EINVAL; + goto unlock_srcu; + } + + ret = xdna->dev_info->ops->hwctx_config(hwctx, args->param_type, val, buf, buf_size); + +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + mutex_unlock(&xdna->dev_lock); + kfree(buf); + return ret; +} + +static void +amdxdna_arg_bos_put(struct amdxdna_sched_job *job) +{ + int i; + + for (i = 0; i < job->bo_cnt; i++) { + if (!job->bos[i]) + break; + drm_gem_object_put(job->bos[i]); + } +} + +static int +amdxdna_arg_bos_lookup(struct amdxdna_client *client, + struct amdxdna_sched_job *job, + u32 *bo_hdls, u32 bo_cnt) +{ + struct drm_gem_object *gobj; + int i, ret; + + job->bo_cnt = bo_cnt; + for (i = 0; i < job->bo_cnt; i++) { + struct amdxdna_gem_obj *abo; + + gobj = drm_gem_object_lookup(client->filp, bo_hdls[i]); + if (!gobj) { + ret = -ENOENT; + goto put_shmem_bo; + } + abo = to_xdna_obj(gobj); + + mutex_lock(&abo->lock); + if (abo->pinned) { + mutex_unlock(&abo->lock); + job->bos[i] = gobj; + continue; + } + + ret = amdxdna_gem_pin_nolock(abo); + if (ret) { + mutex_unlock(&abo->lock); + drm_gem_object_put(gobj); + goto put_shmem_bo; + } + abo->pinned = true; + mutex_unlock(&abo->lock); + + job->bos[i] = gobj; + } + + return 0; + +put_shmem_bo: + amdxdna_arg_bos_put(job); + return ret; +} + +void amdxdna_sched_job_cleanup(struct amdxdna_sched_job *job) +{ + trace_amdxdna_debug_point(job->hwctx->name, job->seq, "job release"); + amdxdna_arg_bos_put(job); + amdxdna_gem_put_obj(job->cmd_bo); +} + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdl, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_sched_job *job; + struct amdxdna_hwctx *hwctx; + int ret, idx; + + XDNA_DBG(xdna, "Command BO hdl %d, Arg BO count %d", cmd_bo_hdl, arg_bo_cnt); + job = kzalloc(struct_size(job, bos, arg_bo_cnt), GFP_KERNEL); + if (!job) + return -ENOMEM; + + if (cmd_bo_hdl != AMDXDNA_INVALID_BO_HANDLE) { + job->cmd_bo = amdxdna_gem_get_obj(client, cmd_bo_hdl, AMDXDNA_BO_CMD); + if (!job->cmd_bo) { + XDNA_ERR(xdna, "Failed to get cmd bo from %d", cmd_bo_hdl); + ret = -EINVAL; + goto free_job; + } + } else { + job->cmd_bo = NULL; + } + + ret = amdxdna_arg_bos_lookup(client, job, arg_bo_hdls, arg_bo_cnt); + if (ret) { + XDNA_ERR(xdna, "Argument BOs lookup failed, ret %d", ret); + goto cmd_put; + } + + idx = srcu_read_lock(&client->hwctx_srcu); + hwctx = idr_find(&client->hwctx_idr, hwctx_hdl); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", + client->pid, hwctx_hdl); + ret = -EINVAL; + goto unlock_srcu; + } + + if (hwctx->status != HWCTX_STAT_READY) { + XDNA_ERR(xdna, "HW Context is not ready"); + ret = -EINVAL; + goto unlock_srcu; + } + + job->hwctx = hwctx; + job->mm = current->mm; + + job->fence = amdxdna_fence_create(hwctx); + if (!job->fence) { + XDNA_ERR(xdna, "Failed to create fence"); + ret = -ENOMEM; + goto unlock_srcu; + } + kref_init(&job->refcnt); + + ret = xdna->dev_info->ops->cmd_submit(hwctx, job, seq); + if (ret) + goto put_fence; + + /* + * The amdxdna_hwctx_destroy_rcu() will release hwctx and associated + * resource after synchronize_srcu(). The submitted jobs should be + * handled by the queue, for example DRM scheduler, in device layer. + * For here we can unlock SRCU. + */ + srcu_read_unlock(&client->hwctx_srcu, idx); + trace_amdxdna_debug_point(hwctx->name, *seq, "job pushed"); + + return 0; + +put_fence: + dma_fence_put(job->fence); +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + amdxdna_arg_bos_put(job); +cmd_put: + amdxdna_gem_put_obj(job->cmd_bo); +free_job: + kfree(job); + return ret; +} + +/* + * The submit command ioctl submits a command to firmware. One firmware command + * may contain multiple command BOs for processing as a whole. + * The command sequence number is returned which can be used for wait command ioctl. + */ +static int amdxdna_drm_submit_execbuf(struct amdxdna_client *client, + struct amdxdna_drm_exec_cmd *args) +{ + struct amdxdna_dev *xdna = client->xdna; + u32 *arg_bo_hdls; + u32 cmd_bo_hdl; + int ret; + + if (!args->arg_count || args->arg_count > MAX_ARG_COUNT) { + XDNA_ERR(xdna, "Invalid arg bo count %d", args->arg_count); + return -EINVAL; + } + + /* Only support single command for now. */ + if (args->cmd_count != 1) { + XDNA_ERR(xdna, "Invalid cmd bo count %d", args->cmd_count); + return -EINVAL; + } + + cmd_bo_hdl = (u32)args->cmd_handles; + arg_bo_hdls = kcalloc(args->arg_count, sizeof(u32), GFP_KERNEL); + if (!arg_bo_hdls) + return -ENOMEM; + ret = copy_from_user(arg_bo_hdls, u64_to_user_ptr(args->args), + args->arg_count * sizeof(u32)); + if (ret) { + ret = -EFAULT; + goto free_cmd_bo_hdls; + } + + ret = amdxdna_cmd_submit(client, cmd_bo_hdl, arg_bo_hdls, + args->arg_count, args->hwctx, &args->seq); + if (ret) + XDNA_DBG(xdna, "Submit cmds failed, ret %d", ret); + +free_cmd_bo_hdls: + kfree(arg_bo_hdls); + if (!ret) + XDNA_DBG(xdna, "Pushed cmd %lld to scheduler", args->seq); + return ret; +} + +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_exec_cmd *args = data; + + if (args->ext || args->ext_flags) + return -EINVAL; + + switch (args->type) { + case AMDXDNA_CMD_SUBMIT_EXEC_BUF: + return amdxdna_drm_submit_execbuf(client, args); + } + + XDNA_ERR(client->xdna, "Invalid command type %d", args->type); + return -EINVAL; +} diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/amdxdna_ctx.h new file mode 100644 index 000000000000..80b0304193ec --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -0,0 +1,162 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_CTX_H_ +#define _AMDXDNA_CTX_H_ + +#include <linux/bitfield.h> + +#include "amdxdna_gem.h" + +struct amdxdna_hwctx_priv; + +enum ert_cmd_opcode { + ERT_START_CU = 0, + ERT_CMD_CHAIN = 19, + ERT_START_NPU = 20, +}; + +enum ert_cmd_state { + ERT_CMD_STATE_INVALID, + ERT_CMD_STATE_NEW, + ERT_CMD_STATE_QUEUED, + ERT_CMD_STATE_RUNNING, + ERT_CMD_STATE_COMPLETED, + ERT_CMD_STATE_ERROR, + ERT_CMD_STATE_ABORT, + ERT_CMD_STATE_SUBMITTED, + ERT_CMD_STATE_TIMEOUT, + ERT_CMD_STATE_NORESPONSE, +}; + +/* + * Interpretation of the beginning of data payload for ERT_START_NPU in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is regular kernel args. + */ +struct amdxdna_cmd_start_npu { + u64 buffer; /* instruction buffer address */ + u32 buffer_size; /* size of buffer in bytes */ + u32 prop_count; /* properties count */ + u32 prop_args[]; /* properties and regular kernel arguments */ +}; + +/* + * Interpretation of the beginning of data payload for ERT_CMD_CHAIN in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is cmd BO handles. + */ +struct amdxdna_cmd_chain { + u32 command_count; + u32 submit_index; + u32 error_index; + u32 reserved[3]; + u64 data[] __counted_by(command_count); +}; + +/* Exec buffer command header format */ +#define AMDXDNA_CMD_STATE GENMASK(3, 0) +#define AMDXDNA_CMD_EXTRA_CU_MASK GENMASK(11, 10) +#define AMDXDNA_CMD_COUNT GENMASK(22, 12) +#define AMDXDNA_CMD_OPCODE GENMASK(27, 23) +struct amdxdna_cmd { + u32 header; + u32 data[]; +}; + +struct amdxdna_hwctx { + struct amdxdna_client *client; + struct amdxdna_hwctx_priv *priv; + char *name; + + u32 id; + u32 max_opc; + u32 num_tiles; + u32 mem_size; + u32 fw_ctx_id; + u32 col_list_len; + u32 *col_list; + u32 start_col; + u32 num_col; +#define HWCTX_STAT_INIT 0 +#define HWCTX_STAT_READY 1 +#define HWCTX_STAT_STOP 2 + u32 status; + u32 old_status; + + struct amdxdna_qos_info qos; + struct amdxdna_hwctx_param_config_cu *cus; + u32 syncobj_hdl; +}; + +#define drm_job_to_xdna_job(j) \ + container_of(j, struct amdxdna_sched_job, base) + +struct amdxdna_sched_job { + struct drm_sched_job base; + struct kref refcnt; + struct amdxdna_hwctx *hwctx; + struct mm_struct *mm; + /* The fence to notice DRM scheduler that job is done by hardware */ + struct dma_fence *fence; + /* user can wait on this fence */ + struct dma_fence *out_fence; + bool job_done; + u64 seq; + struct amdxdna_gem_obj *cmd_bo; + size_t bo_cnt; + struct drm_gem_object *bos[] __counted_by(bo_cnt); +}; + +static inline u32 +amdxdna_cmd_get_op(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header); +} + +static inline void +amdxdna_cmd_set_state(struct amdxdna_gem_obj *abo, enum ert_cmd_state s) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + + cmd->header &= ~AMDXDNA_CMD_STATE; + cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s); +} + +static inline enum ert_cmd_state +amdxdna_cmd_get_state(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_STATE, cmd->header); +} + +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size); +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo); + +static inline u32 amdxdna_hwctx_col_map(struct amdxdna_hwctx *hwctx) +{ + return GENMASK(hwctx->start_col + hwctx->num_col - 1, + hwctx->start_col); +} + +void amdxdna_sched_job_cleanup(struct amdxdna_sched_job *job); +void amdxdna_hwctx_remove_all(struct amdxdna_client *client); +void amdxdna_hwctx_suspend(struct amdxdna_client *client); +void amdxdna_hwctx_resume(struct amdxdna_client *client); + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdls, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq); + +int amdxdna_cmd_wait(struct amdxdna_client *client, u32 hwctx_hdl, + u64 seq, u32 timeout); + +int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); + +#endif /* _AMDXDNA_CTX_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_gem.c b/drivers/accel/amdxdna/amdxdna_gem.c new file mode 100644 index 000000000000..4dfeca306d98 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_gem.c @@ -0,0 +1,622 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_cache.h> +#include <drm/drm_device.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/gpu_scheduler.h> +#include <linux/iosys-map.h> +#include <linux/vmalloc.h> + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define XDNA_MAX_CMD_BO_SIZE SZ_32K + +static int +amdxdna_gem_insert_node_locked(struct amdxdna_gem_obj *abo, bool use_vmap) +{ + struct amdxdna_client *client = abo->client; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_mem *mem = &abo->mem; + u64 offset; + u32 align; + int ret; + + align = 1 << max(PAGE_SHIFT, xdna->dev_info->dev_mem_buf_shift); + ret = drm_mm_insert_node_generic(&abo->dev_heap->mm, &abo->mm_node, + mem->size, align, + 0, DRM_MM_INSERT_BEST); + if (ret) { + XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret); + return ret; + } + + mem->dev_addr = abo->mm_node.start; + offset = mem->dev_addr - abo->dev_heap->mem.dev_addr; + mem->userptr = abo->dev_heap->mem.userptr + offset; + mem->pages = &abo->dev_heap->base.pages[offset >> PAGE_SHIFT]; + mem->nr_pages = mem->size >> PAGE_SHIFT; + + if (use_vmap) { + mem->kva = vmap(mem->pages, mem->nr_pages, VM_MAP, PAGE_KERNEL); + if (!mem->kva) { + XDNA_ERR(xdna, "Failed to vmap"); + drm_mm_remove_node(&abo->mm_node); + return -EFAULT; + } + } + + return 0; +} + +static void amdxdna_gem_obj_free(struct drm_gem_object *gobj) +{ + struct amdxdna_dev *xdna = to_xdna_dev(gobj->dev); + struct amdxdna_gem_obj *abo = to_xdna_obj(gobj); + struct iosys_map map = IOSYS_MAP_INIT_VADDR(abo->mem.kva); + + XDNA_DBG(xdna, "BO type %d xdna_addr 0x%llx", abo->type, abo->mem.dev_addr); + if (abo->pinned) + amdxdna_gem_unpin(abo); + + if (abo->type == AMDXDNA_BO_DEV) { + mutex_lock(&abo->client->mm_lock); + drm_mm_remove_node(&abo->mm_node); + mutex_unlock(&abo->client->mm_lock); + + vunmap(abo->mem.kva); + drm_gem_object_put(to_gobj(abo->dev_heap)); + drm_gem_object_release(gobj); + mutex_destroy(&abo->lock); + kfree(abo); + return; + } + + if (abo->type == AMDXDNA_BO_DEV_HEAP) + drm_mm_takedown(&abo->mm); + + drm_gem_vunmap_unlocked(gobj, &map); + mutex_destroy(&abo->lock); + drm_gem_shmem_free(&abo->base); +} + +static const struct drm_gem_object_funcs amdxdna_gem_dev_obj_funcs = { + .free = amdxdna_gem_obj_free, +}; + +static bool amdxdna_hmm_invalidate(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct amdxdna_gem_obj *abo = container_of(mni, struct amdxdna_gem_obj, + mem.notifier); + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + + XDNA_DBG(xdna, "Invalid range 0x%llx, 0x%lx, type %d", + abo->mem.userptr, abo->mem.size, abo->type); + + if (!mmu_notifier_range_blockable(range)) + return false; + + xdna->dev_info->ops->hmm_invalidate(abo, cur_seq); + + return true; +} + +static const struct mmu_interval_notifier_ops amdxdna_hmm_ops = { + .invalidate = amdxdna_hmm_invalidate, +}; + +static void amdxdna_hmm_unregister(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + + if (!xdna->dev_info->ops->hmm_invalidate) + return; + + mmu_interval_notifier_remove(&abo->mem.notifier); + kvfree(abo->mem.pfns); + abo->mem.pfns = NULL; +} + +static int amdxdna_hmm_register(struct amdxdna_gem_obj *abo, unsigned long addr, + size_t len) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + u32 nr_pages; + int ret; + + if (!xdna->dev_info->ops->hmm_invalidate) + return 0; + + if (abo->mem.pfns) + return -EEXIST; + + nr_pages = (PAGE_ALIGN(addr + len) - (addr & PAGE_MASK)) >> PAGE_SHIFT; + abo->mem.pfns = kvcalloc(nr_pages, sizeof(*abo->mem.pfns), + GFP_KERNEL); + if (!abo->mem.pfns) + return -ENOMEM; + + ret = mmu_interval_notifier_insert_locked(&abo->mem.notifier, + current->mm, + addr, + len, + &amdxdna_hmm_ops); + if (ret) { + XDNA_ERR(xdna, "Insert mmu notifier failed, ret %d", ret); + kvfree(abo->mem.pfns); + } + abo->mem.userptr = addr; + + return ret; +} + +static int amdxdna_gem_obj_mmap(struct drm_gem_object *gobj, + struct vm_area_struct *vma) +{ + struct amdxdna_gem_obj *abo = to_xdna_obj(gobj); + unsigned long num_pages; + int ret; + + ret = amdxdna_hmm_register(abo, vma->vm_start, gobj->size); + if (ret) + return ret; + + ret = drm_gem_shmem_mmap(&abo->base, vma); + if (ret) + goto hmm_unreg; + + num_pages = gobj->size >> PAGE_SHIFT; + /* Try to insert the pages */ + vm_flags_mod(vma, VM_MIXEDMAP, VM_PFNMAP); + ret = vm_insert_pages(vma, vma->vm_start, abo->base.pages, &num_pages); + if (ret) + XDNA_ERR(abo->client->xdna, "Failed insert pages, ret %d", ret); + + return 0; + +hmm_unreg: + amdxdna_hmm_unregister(abo); + return ret; +} + +static vm_fault_t amdxdna_gem_vm_fault(struct vm_fault *vmf) +{ + return drm_gem_shmem_vm_ops.fault(vmf); +} + +static void amdxdna_gem_vm_open(struct vm_area_struct *vma) +{ + drm_gem_shmem_vm_ops.open(vma); +} + +static void amdxdna_gem_vm_close(struct vm_area_struct *vma) +{ + struct drm_gem_object *gobj = vma->vm_private_data; + + amdxdna_hmm_unregister(to_xdna_obj(gobj)); + drm_gem_shmem_vm_ops.close(vma); +} + +static const struct vm_operations_struct amdxdna_gem_vm_ops = { + .fault = amdxdna_gem_vm_fault, + .open = amdxdna_gem_vm_open, + .close = amdxdna_gem_vm_close, +}; + +static const struct drm_gem_object_funcs amdxdna_gem_shmem_funcs = { + .free = amdxdna_gem_obj_free, + .print_info = drm_gem_shmem_object_print_info, + .pin = drm_gem_shmem_object_pin, + .unpin = drm_gem_shmem_object_unpin, + .get_sg_table = drm_gem_shmem_object_get_sg_table, + .vmap = drm_gem_shmem_object_vmap, + .vunmap = drm_gem_shmem_object_vunmap, + .mmap = amdxdna_gem_obj_mmap, + .vm_ops = &amdxdna_gem_vm_ops, +}; + +static struct amdxdna_gem_obj * +amdxdna_gem_create_obj(struct drm_device *dev, size_t size) +{ + struct amdxdna_gem_obj *abo; + + abo = kzalloc(sizeof(*abo), GFP_KERNEL); + if (!abo) + return ERR_PTR(-ENOMEM); + + abo->pinned = false; + abo->assigned_hwctx = AMDXDNA_INVALID_CTX_HANDLE; + mutex_init(&abo->lock); + + abo->mem.userptr = AMDXDNA_INVALID_ADDR; + abo->mem.dev_addr = AMDXDNA_INVALID_ADDR; + abo->mem.size = size; + + return abo; +} + +/* For drm_driver->gem_create_object callback */ +struct drm_gem_object * +amdxdna_gem_create_object_cb(struct drm_device *dev, size_t size) +{ + struct amdxdna_gem_obj *abo; + + abo = amdxdna_gem_create_obj(dev, size); + if (IS_ERR(abo)) + return ERR_CAST(abo); + + to_gobj(abo)->funcs = &amdxdna_gem_shmem_funcs; + + return to_gobj(abo); +} + +static struct amdxdna_gem_obj * +amdxdna_drm_alloc_shmem(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + + shmem = drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) + return ERR_CAST(shmem); + + shmem->map_wc = false; + + abo = to_xdna_obj(&shmem->base); + abo->client = client; + abo->type = AMDXDNA_BO_SHMEM; + + return abo; +} + +static struct amdxdna_gem_obj * +amdxdna_drm_create_dev_heap(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + int ret; + + if (args->size > xdna->dev_info->dev_mem_size) { + XDNA_DBG(xdna, "Invalid dev heap size 0x%llx, limit 0x%lx", + args->size, xdna->dev_info->dev_mem_size); + return ERR_PTR(-EINVAL); + } + + mutex_lock(&client->mm_lock); + if (client->dev_heap) { + XDNA_DBG(client->xdna, "dev heap is already created"); + ret = -EBUSY; + goto mm_unlock; + } + + shmem = drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) { + ret = PTR_ERR(shmem); + goto mm_unlock; + } + + shmem->map_wc = false; + abo = to_xdna_obj(&shmem->base); + + abo->type = AMDXDNA_BO_DEV_HEAP; + abo->client = client; + abo->mem.dev_addr = client->xdna->dev_info->dev_mem_base; + drm_mm_init(&abo->mm, abo->mem.dev_addr, abo->mem.size); + + client->dev_heap = abo; + drm_gem_object_get(to_gobj(abo)); + mutex_unlock(&client->mm_lock); + + return abo; + +mm_unlock: + mutex_unlock(&client->mm_lock); + return ERR_PTR(ret); +} + +struct amdxdna_gem_obj * +amdxdna_drm_alloc_dev_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp, bool use_vmap) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + size_t aligned_sz = PAGE_ALIGN(args->size); + struct amdxdna_gem_obj *abo, *heap; + int ret; + + mutex_lock(&client->mm_lock); + heap = client->dev_heap; + if (!heap) { + ret = -EINVAL; + goto mm_unlock; + } + + if (heap->mem.userptr == AMDXDNA_INVALID_ADDR) { + XDNA_ERR(xdna, "Invalid dev heap userptr"); + ret = -EINVAL; + goto mm_unlock; + } + + if (args->size > heap->mem.size) { + XDNA_ERR(xdna, "Invalid dev bo size 0x%llx, limit 0x%lx", + args->size, heap->mem.size); + ret = -EINVAL; + goto mm_unlock; + } + + abo = amdxdna_gem_create_obj(&xdna->ddev, aligned_sz); + if (IS_ERR(abo)) { + ret = PTR_ERR(abo); + goto mm_unlock; + } + to_gobj(abo)->funcs = &amdxdna_gem_dev_obj_funcs; + abo->type = AMDXDNA_BO_DEV; + abo->client = client; + abo->dev_heap = heap; + ret = amdxdna_gem_insert_node_locked(abo, use_vmap); + if (ret) { + XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret); + goto mm_unlock; + } + + drm_gem_object_get(to_gobj(heap)); + drm_gem_private_object_init(&xdna->ddev, to_gobj(abo), aligned_sz); + + mutex_unlock(&client->mm_lock); + return abo; + +mm_unlock: + mutex_unlock(&client->mm_lock); + return ERR_PTR(ret); +} + +static struct amdxdna_gem_obj * +amdxdna_drm_create_cmd_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + struct iosys_map map; + int ret; + + if (args->size > XDNA_MAX_CMD_BO_SIZE) { + XDNA_ERR(xdna, "Command bo size 0x%llx too large", args->size); + return ERR_PTR(-EINVAL); + } + + if (args->size < sizeof(struct amdxdna_cmd)) { + XDNA_DBG(xdna, "Command BO size 0x%llx too small", args->size); + return ERR_PTR(-EINVAL); + } + + shmem = drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) + return ERR_CAST(shmem); + + shmem->map_wc = false; + abo = to_xdna_obj(&shmem->base); + + abo->type = AMDXDNA_BO_CMD; + abo->client = filp->driver_priv; + + ret = drm_gem_vmap_unlocked(to_gobj(abo), &map); + if (ret) { + XDNA_ERR(xdna, "Vmap cmd bo failed, ret %d", ret); + goto release_obj; + } + abo->mem.kva = map.vaddr; + + return abo; + +release_obj: + drm_gem_shmem_free(shmem); + return ERR_PTR(ret); +} + +int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_create_bo *args = data; + struct amdxdna_gem_obj *abo; + int ret; + + if (args->flags || args->vaddr || !args->size) + return -EINVAL; + + XDNA_DBG(xdna, "BO arg type %d vaddr 0x%llx size 0x%llx flags 0x%llx", + args->type, args->vaddr, args->size, args->flags); + switch (args->type) { + case AMDXDNA_BO_SHMEM: + abo = amdxdna_drm_alloc_shmem(dev, args, filp); + break; + case AMDXDNA_BO_DEV_HEAP: + abo = amdxdna_drm_create_dev_heap(dev, args, filp); + break; + case AMDXDNA_BO_DEV: + abo = amdxdna_drm_alloc_dev_bo(dev, args, filp, false); + break; + case AMDXDNA_BO_CMD: + abo = amdxdna_drm_create_cmd_bo(dev, args, filp); + break; + default: + return -EINVAL; + } + if (IS_ERR(abo)) + return PTR_ERR(abo); + + /* ready to publish object to userspace */ + ret = drm_gem_handle_create(filp, to_gobj(abo), &args->handle); + if (ret) { + XDNA_ERR(xdna, "Create handle failed"); + goto put_obj; + } + + XDNA_DBG(xdna, "BO hdl %d type %d userptr 0x%llx xdna_addr 0x%llx size 0x%lx", + args->handle, args->type, abo->mem.userptr, + abo->mem.dev_addr, abo->mem.size); +put_obj: + /* Dereference object reference. Handle holds it now. */ + drm_gem_object_put(to_gobj(abo)); + return ret; +} + +int amdxdna_gem_pin_nolock(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + int ret; + + switch (abo->type) { + case AMDXDNA_BO_SHMEM: + case AMDXDNA_BO_DEV_HEAP: + ret = drm_gem_shmem_pin(&abo->base); + break; + case AMDXDNA_BO_DEV: + ret = drm_gem_shmem_pin(&abo->dev_heap->base); + break; + default: + ret = -EOPNOTSUPP; + } + + XDNA_DBG(xdna, "BO type %d ret %d", abo->type, ret); + return ret; +} + +int amdxdna_gem_pin(struct amdxdna_gem_obj *abo) +{ + int ret; + + if (abo->type == AMDXDNA_BO_DEV) + abo = abo->dev_heap; + + mutex_lock(&abo->lock); + ret = amdxdna_gem_pin_nolock(abo); + mutex_unlock(&abo->lock); + + return ret; +} + +void amdxdna_gem_unpin(struct amdxdna_gem_obj *abo) +{ + if (abo->type == AMDXDNA_BO_DEV) + abo = abo->dev_heap; + + mutex_lock(&abo->lock); + drm_gem_shmem_unpin(&abo->base); + mutex_unlock(&abo->lock); +} + +struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, + u32 bo_hdl, u8 bo_type) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + + gobj = drm_gem_object_lookup(client->filp, bo_hdl); + if (!gobj) { + XDNA_DBG(xdna, "Can not find bo %d", bo_hdl); + return NULL; + } + + abo = to_xdna_obj(gobj); + if (bo_type == AMDXDNA_BO_INVALID || abo->type == bo_type) + return abo; + + drm_gem_object_put(gobj); + return NULL; +} + +int amdxdna_drm_get_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_drm_get_bo_info *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + int ret = 0; + + if (args->ext || args->ext_flags) + return -EINVAL; + + gobj = drm_gem_object_lookup(filp, args->handle); + if (!gobj) { + XDNA_DBG(xdna, "Lookup GEM object %d failed", args->handle); + return -ENOENT; + } + + abo = to_xdna_obj(gobj); + args->vaddr = abo->mem.userptr; + args->xdna_addr = abo->mem.dev_addr; + + if (abo->type != AMDXDNA_BO_DEV) + args->map_offset = drm_vma_node_offset_addr(&gobj->vma_node); + else + args->map_offset = AMDXDNA_INVALID_ADDR; + + XDNA_DBG(xdna, "BO hdl %d map_offset 0x%llx vaddr 0x%llx xdna_addr 0x%llx", + args->handle, args->map_offset, args->vaddr, args->xdna_addr); + + drm_gem_object_put(gobj); + return ret; +} + +/* + * The sync bo ioctl is to make sure the CPU cache is in sync with memory. + * This is required because NPU is not cache coherent device. CPU cache + * flushing/invalidation is expensive so it is best to handle this outside + * of the command submission path. This ioctl allows explicit cache + * flushing/invalidation outside of the critical path. + */ +int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, + void *data, struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_sync_bo *args = data; + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + int ret; + + gobj = drm_gem_object_lookup(filp, args->handle); + if (!gobj) { + XDNA_ERR(xdna, "Lookup GEM object failed"); + return -ENOENT; + } + abo = to_xdna_obj(gobj); + + ret = amdxdna_gem_pin(abo); + if (ret) { + XDNA_ERR(xdna, "Pin BO %d failed, ret %d", args->handle, ret); + goto put_obj; + } + + if (abo->type == AMDXDNA_BO_DEV) + drm_clflush_pages(abo->mem.pages, abo->mem.nr_pages); + else + drm_clflush_pages(abo->base.pages, gobj->size >> PAGE_SHIFT); + + amdxdna_gem_unpin(abo); + + XDNA_DBG(xdna, "Sync bo %d offset 0x%llx, size 0x%llx\n", + args->handle, args->offset, args->size); + +put_obj: + drm_gem_object_put(gobj); + return ret; +} diff --git a/drivers/accel/amdxdna/amdxdna_gem.h b/drivers/accel/amdxdna/amdxdna_gem.h new file mode 100644 index 000000000000..8ccc0375dd9d --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_gem.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_GEM_H_ +#define _AMDXDNA_GEM_H_ + +struct amdxdna_mem { + u64 userptr; + void *kva; + u64 dev_addr; + size_t size; + struct page **pages; + u32 nr_pages; + struct mmu_interval_notifier notifier; + unsigned long *pfns; + bool map_invalid; +}; + +struct amdxdna_gem_obj { + struct drm_gem_shmem_object base; + struct amdxdna_client *client; + u8 type; + bool pinned; + struct mutex lock; /* Protects: pinned */ + struct amdxdna_mem mem; + + /* Below members is uninitialized when needed */ + struct drm_mm mm; /* For AMDXDNA_BO_DEV_HEAP */ + struct amdxdna_gem_obj *dev_heap; /* For AMDXDNA_BO_DEV */ + struct drm_mm_node mm_node; /* For AMDXDNA_BO_DEV */ + u32 assigned_hwctx; +}; + +#define to_gobj(obj) (&(obj)->base.base) + +static inline struct amdxdna_gem_obj *to_xdna_obj(struct drm_gem_object *gobj) +{ + return container_of(gobj, struct amdxdna_gem_obj, base.base); +} + +struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, + u32 bo_hdl, u8 bo_type); +static inline void amdxdna_gem_put_obj(struct amdxdna_gem_obj *abo) +{ + drm_gem_object_put(to_gobj(abo)); +} + +struct drm_gem_object * +amdxdna_gem_create_object_cb(struct drm_device *dev, size_t size); +struct amdxdna_gem_obj * +amdxdna_drm_alloc_dev_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp, bool use_vmap); + +int amdxdna_gem_pin_nolock(struct amdxdna_gem_obj *abo); +int amdxdna_gem_pin(struct amdxdna_gem_obj *abo); +void amdxdna_gem_unpin(struct amdxdna_gem_obj *abo); + +int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_get_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); + +#endif /* _AMDXDNA_GEM_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.c b/drivers/accel/amdxdna/amdxdna_mailbox.c new file mode 100644 index 000000000000..415d99abaaa3 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox.c @@ -0,0 +1,576 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/drm_device.h> +#include <drm/drm_managed.h> +#include <linux/bitfield.h> +#include <linux/iopoll.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/amdxdna.h> + +#include "amdxdna_mailbox.h" + +#define MB_ERR(chann, fmt, args...) \ +({ \ + typeof(chann) _chann = chann; \ + dev_err((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) +#define MB_DBG(chann, fmt, args...) \ +({ \ + typeof(chann) _chann = chann; \ + dev_dbg((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) +#define MB_WARN_ONCE(chann, fmt, args...) \ +({ \ + typeof(chann) _chann = chann; \ + dev_warn_once((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) + +#define MAGIC_VAL 0x1D000000U +#define MAGIC_VAL_MASK 0xFF000000 +#define MAX_MSG_ID_ENTRIES 256 +#define MSG_RX_TIMER 200 /* milliseconds */ +#define MAILBOX_NAME "xdna_mailbox" + +enum channel_res_type { + CHAN_RES_X2I, + CHAN_RES_I2X, + CHAN_RES_NUM +}; + +struct mailbox { + struct device *dev; + struct xdna_mailbox_res res; +}; + +struct mailbox_channel { + struct mailbox *mb; + struct xdna_mailbox_chann_res res[CHAN_RES_NUM]; + int msix_irq; + u32 iohub_int_addr; + struct idr chan_idr; + spinlock_t chan_idr_lock; /* protect chan_idr */ + u32 x2i_tail; + + /* Received msg related fields */ + struct workqueue_struct *work_q; + struct work_struct rx_work; + u32 i2x_head; + bool bad_state; +}; + +#define MSG_BODY_SZ GENMASK(10, 0) +#define MSG_PROTO_VER GENMASK(23, 16) +struct xdna_msg_header { + __u32 total_size; + __u32 sz_ver; + __u32 id; + __u32 opcode; +} __packed; + +static_assert(sizeof(struct xdna_msg_header) == 16); + +struct mailbox_pkg { + struct xdna_msg_header header; + __u32 payload[]; +}; + +/* The protocol version. */ +#define MSG_PROTOCOL_VERSION 0x1 +/* The tombstone value. */ +#define TOMBSTONE 0xDEADFACE + +struct mailbox_msg { + void *handle; + int (*notify_cb)(void *handle, const u32 *data, size_t size); + size_t pkg_size; /* package size in bytes */ + struct mailbox_pkg pkg; +}; + +static void mailbox_reg_write(struct mailbox_channel *mb_chann, u32 mbox_reg, u32 data) +{ + struct xdna_mailbox_res *mb_res = &mb_chann->mb->res; + u64 ringbuf_addr = mb_res->mbox_base + mbox_reg; + + writel(data, (void *)ringbuf_addr); +} + +static u32 mailbox_reg_read(struct mailbox_channel *mb_chann, u32 mbox_reg) +{ + struct xdna_mailbox_res *mb_res = &mb_chann->mb->res; + u64 ringbuf_addr = mb_res->mbox_base + mbox_reg; + + return readl((void *)ringbuf_addr); +} + +static int mailbox_reg_read_non_zero(struct mailbox_channel *mb_chann, u32 mbox_reg, u32 *val) +{ + struct xdna_mailbox_res *mb_res = &mb_chann->mb->res; + u64 ringbuf_addr = mb_res->mbox_base + mbox_reg; + int ret, value; + + /* Poll till value is not zero */ + ret = readx_poll_timeout(readl, (void *)ringbuf_addr, value, + value, 1 /* us */, 100); + if (ret < 0) + return ret; + + *val = value; + return 0; +} + +static inline void +mailbox_set_headptr(struct mailbox_channel *mb_chann, u32 headptr_val) +{ + mailbox_reg_write(mb_chann, mb_chann->res[CHAN_RES_I2X].mb_head_ptr_reg, headptr_val); + mb_chann->i2x_head = headptr_val; +} + +static inline void +mailbox_set_tailptr(struct mailbox_channel *mb_chann, u32 tailptr_val) +{ + mailbox_reg_write(mb_chann, mb_chann->res[CHAN_RES_X2I].mb_tail_ptr_reg, tailptr_val); + mb_chann->x2i_tail = tailptr_val; +} + +static inline u32 +mailbox_get_headptr(struct mailbox_channel *mb_chann, enum channel_res_type type) +{ + return mailbox_reg_read(mb_chann, mb_chann->res[type].mb_head_ptr_reg); +} + +static inline u32 +mailbox_get_tailptr(struct mailbox_channel *mb_chann, enum channel_res_type type) +{ + return mailbox_reg_read(mb_chann, mb_chann->res[type].mb_tail_ptr_reg); +} + +static inline u32 +mailbox_get_ringbuf_size(struct mailbox_channel *mb_chann, enum channel_res_type type) +{ + return mb_chann->res[type].rb_size; +} + +static inline int mailbox_validate_msgid(int msg_id) +{ + return (msg_id & MAGIC_VAL_MASK) == MAGIC_VAL; +} + +static int mailbox_acquire_msgid(struct mailbox_channel *mb_chann, struct mailbox_msg *mb_msg) +{ + unsigned long flags; + int msg_id; + + spin_lock_irqsave(&mb_chann->chan_idr_lock, flags); + msg_id = idr_alloc_cyclic(&mb_chann->chan_idr, mb_msg, 0, + MAX_MSG_ID_ENTRIES, GFP_NOWAIT); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); + if (msg_id < 0) + return msg_id; + + /* + * The IDR becomes less efficient when dealing with larger IDs. + * Thus, add MAGIC_VAL to the higher bits. + */ + msg_id |= MAGIC_VAL; + return msg_id; +} + +static void mailbox_release_msgid(struct mailbox_channel *mb_chann, int msg_id) +{ + unsigned long flags; + + msg_id &= ~MAGIC_VAL_MASK; + spin_lock_irqsave(&mb_chann->chan_idr_lock, flags); + idr_remove(&mb_chann->chan_idr, msg_id); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); +} + +static int mailbox_release_msg(int id, void *p, void *data) +{ + struct mailbox_channel *mb_chann = data; + struct mailbox_msg *mb_msg = p; + + MB_DBG(mb_chann, "msg_id 0x%x msg opcode 0x%x", + mb_msg->pkg.header.id, mb_msg->pkg.header.opcode); + mb_msg->notify_cb(mb_msg->handle, NULL, 0); + kfree(mb_msg); + + return 0; +} + +static int +mailbox_send_msg(struct mailbox_channel *mb_chann, struct mailbox_msg *mb_msg) +{ + u32 ringbuf_size; + u32 head, tail; + u32 start_addr; + u64 write_addr; + u32 tmp_tail; + + head = mailbox_get_headptr(mb_chann, CHAN_RES_X2I); + tail = mb_chann->x2i_tail; + ringbuf_size = mailbox_get_ringbuf_size(mb_chann, CHAN_RES_X2I); + start_addr = mb_chann->res[CHAN_RES_X2I].rb_start_addr; + tmp_tail = tail + mb_msg->pkg_size; + + if (tail < head && tmp_tail >= head) + goto no_space; + + if (tail >= head && (tmp_tail > ringbuf_size - sizeof(u32) && + mb_msg->pkg_size >= head)) + goto no_space; + + if (tail >= head && tmp_tail > ringbuf_size - sizeof(u32)) { + write_addr = mb_chann->mb->res.ringbuf_base + start_addr + tail; + writel(TOMBSTONE, (void *)write_addr); + + /* tombstone is set. Write from the start of the ringbuf */ + tail = 0; + } + + write_addr = mb_chann->mb->res.ringbuf_base + start_addr + tail; + memcpy_toio((void *)write_addr, &mb_msg->pkg, mb_msg->pkg_size); + mailbox_set_tailptr(mb_chann, tail + mb_msg->pkg_size); + + trace_mbox_set_tail(MAILBOX_NAME, mb_chann->msix_irq, + mb_msg->pkg.header.opcode, + mb_msg->pkg.header.id); + + return 0; + +no_space: + return -ENOSPC; +} + +static int +mailbox_get_resp(struct mailbox_channel *mb_chann, struct xdna_msg_header *header, + void *data) +{ + struct mailbox_msg *mb_msg; + unsigned long flags; + int msg_id; + int ret; + + msg_id = header->id; + if (!mailbox_validate_msgid(msg_id)) { + MB_ERR(mb_chann, "Bad message ID 0x%x", msg_id); + return -EINVAL; + } + + msg_id &= ~MAGIC_VAL_MASK; + spin_lock_irqsave(&mb_chann->chan_idr_lock, flags); + mb_msg = idr_find(&mb_chann->chan_idr, msg_id); + if (!mb_msg) { + MB_ERR(mb_chann, "Cannot find msg 0x%x", msg_id); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); + return -EINVAL; + } + idr_remove(&mb_chann->chan_idr, msg_id); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); + + MB_DBG(mb_chann, "opcode 0x%x size %d id 0x%x", + header->opcode, header->total_size, header->id); + ret = mb_msg->notify_cb(mb_msg->handle, data, header->total_size); + if (unlikely(ret)) + MB_ERR(mb_chann, "Message callback ret %d", ret); + + kfree(mb_msg); + return ret; +} + +static int mailbox_get_msg(struct mailbox_channel *mb_chann) +{ + struct xdna_msg_header header; + u32 msg_size, rest; + u32 ringbuf_size; + u32 head, tail; + u32 start_addr; + u64 read_addr; + int ret; + + if (mailbox_reg_read_non_zero(mb_chann, mb_chann->res[CHAN_RES_I2X].mb_tail_ptr_reg, &tail)) + return -EINVAL; + head = mb_chann->i2x_head; + ringbuf_size = mailbox_get_ringbuf_size(mb_chann, CHAN_RES_I2X); + start_addr = mb_chann->res[CHAN_RES_I2X].rb_start_addr; + + if (unlikely(tail > ringbuf_size || !IS_ALIGNED(tail, 4))) { + MB_WARN_ONCE(mb_chann, "Invalid tail 0x%x", tail); + return -EINVAL; + } + + /* ringbuf empty */ + if (head == tail) + return -ENOENT; + + if (head == ringbuf_size) + head = 0; + + /* Peek size of the message or TOMBSTONE */ + read_addr = mb_chann->mb->res.ringbuf_base + start_addr + head; + header.total_size = readl((void *)read_addr); + /* size is TOMBSTONE, set next read from 0 */ + if (header.total_size == TOMBSTONE) { + if (head < tail) { + MB_WARN_ONCE(mb_chann, "Tombstone, head 0x%x tail 0x%x", + head, tail); + return -EINVAL; + } + mailbox_set_headptr(mb_chann, 0); + return 0; + } + + if (unlikely(!header.total_size || !IS_ALIGNED(header.total_size, 4))) { + MB_WARN_ONCE(mb_chann, "Invalid total size 0x%x", header.total_size); + return -EINVAL; + } + msg_size = sizeof(header) + header.total_size; + + if (msg_size > ringbuf_size - head || msg_size > tail - head) { + MB_WARN_ONCE(mb_chann, "Invalid message size %d, tail %d, head %d", + msg_size, tail, head); + return -EINVAL; + } + + rest = sizeof(header) - sizeof(u32); + read_addr += sizeof(u32); + memcpy_fromio((u32 *)&header + 1, (void *)read_addr, rest); + read_addr += rest; + + ret = mailbox_get_resp(mb_chann, &header, (u32 *)read_addr); + + mailbox_set_headptr(mb_chann, head + msg_size); + /* After update head, it can equal to ringbuf_size. This is expected. */ + trace_mbox_set_head(MAILBOX_NAME, mb_chann->msix_irq, + header.opcode, header.id); + + return ret; +} + +static irqreturn_t mailbox_irq_handler(int irq, void *p) +{ + struct mailbox_channel *mb_chann = p; + + trace_mbox_irq_handle(MAILBOX_NAME, irq); + /* Schedule a rx_work to call the callback functions */ + queue_work(mb_chann->work_q, &mb_chann->rx_work); + /* Clear IOHUB register */ + mailbox_reg_write(mb_chann, mb_chann->iohub_int_addr, 0); + + return IRQ_HANDLED; +} + +static void mailbox_rx_worker(struct work_struct *rx_work) +{ + struct mailbox_channel *mb_chann; + int ret; + + mb_chann = container_of(rx_work, struct mailbox_channel, rx_work); + + if (READ_ONCE(mb_chann->bad_state)) { + MB_ERR(mb_chann, "Channel in bad state, work aborted"); + return; + } + + while (1) { + /* + * If return is 0, keep consuming next message, until there is + * no messages or an error happened. + */ + ret = mailbox_get_msg(mb_chann); + if (ret == -ENOENT) + break; + + /* Other error means device doesn't look good, disable irq. */ + if (unlikely(ret)) { + MB_ERR(mb_chann, "Unexpected ret %d, disable irq", ret); + WRITE_ONCE(mb_chann->bad_state, true); + disable_irq(mb_chann->msix_irq); + break; + } + } +} + +int xdna_mailbox_send_msg(struct mailbox_channel *mb_chann, + const struct xdna_mailbox_msg *msg, u64 tx_timeout) +{ + struct xdna_msg_header *header; + struct mailbox_msg *mb_msg; + size_t pkg_size; + int ret; + + pkg_size = sizeof(*header) + msg->send_size; + if (pkg_size > mailbox_get_ringbuf_size(mb_chann, CHAN_RES_X2I)) { + MB_ERR(mb_chann, "Message size larger than ringbuf size"); + return -EINVAL; + } + + if (unlikely(!IS_ALIGNED(msg->send_size, 4))) { + MB_ERR(mb_chann, "Message must be 4 bytes align"); + return -EINVAL; + } + + /* The fist word in payload can NOT be TOMBSTONE */ + if (unlikely(((u32 *)msg->send_data)[0] == TOMBSTONE)) { + MB_ERR(mb_chann, "Tomb stone in data"); + return -EINVAL; + } + + if (READ_ONCE(mb_chann->bad_state)) { + MB_ERR(mb_chann, "Channel in bad state"); + return -EPIPE; + } + + mb_msg = kzalloc(sizeof(*mb_msg) + pkg_size, GFP_KERNEL); + if (!mb_msg) + return -ENOMEM; + + mb_msg->handle = msg->handle; + mb_msg->notify_cb = msg->notify_cb; + mb_msg->pkg_size = pkg_size; + + header = &mb_msg->pkg.header; + /* + * Hardware use total_size and size to split huge message. + * We do not support it here. Thus the values are the same. + */ + header->total_size = msg->send_size; + header->sz_ver = FIELD_PREP(MSG_BODY_SZ, msg->send_size) | + FIELD_PREP(MSG_PROTO_VER, MSG_PROTOCOL_VERSION); + header->opcode = msg->opcode; + memcpy(mb_msg->pkg.payload, msg->send_data, msg->send_size); + + ret = mailbox_acquire_msgid(mb_chann, mb_msg); + if (unlikely(ret < 0)) { + MB_ERR(mb_chann, "mailbox_acquire_msgid failed"); + goto msg_id_failed; + } + header->id = ret; + + MB_DBG(mb_chann, "opcode 0x%x size %d id 0x%x", + header->opcode, header->total_size, header->id); + + ret = mailbox_send_msg(mb_chann, mb_msg); + if (ret) { + MB_DBG(mb_chann, "Error in mailbox send msg, ret %d", ret); + goto release_id; + } + + return 0; + +release_id: + mailbox_release_msgid(mb_chann, header->id); +msg_id_failed: + kfree(mb_msg); + return ret; +} + +struct mailbox_channel * +xdna_mailbox_create_channel(struct mailbox *mb, + const struct xdna_mailbox_chann_res *x2i, + const struct xdna_mailbox_chann_res *i2x, + u32 iohub_int_addr, + int mb_irq) +{ + struct mailbox_channel *mb_chann; + int ret; + + if (!is_power_of_2(x2i->rb_size) || !is_power_of_2(i2x->rb_size)) { + pr_err("Ring buf size must be power of 2"); + return NULL; + } + + mb_chann = kzalloc(sizeof(*mb_chann), GFP_KERNEL); + if (!mb_chann) + return NULL; + + mb_chann->mb = mb; + mb_chann->msix_irq = mb_irq; + mb_chann->iohub_int_addr = iohub_int_addr; + memcpy(&mb_chann->res[CHAN_RES_X2I], x2i, sizeof(*x2i)); + memcpy(&mb_chann->res[CHAN_RES_I2X], i2x, sizeof(*i2x)); + + spin_lock_init(&mb_chann->chan_idr_lock); + idr_init(&mb_chann->chan_idr); + mb_chann->x2i_tail = mailbox_get_tailptr(mb_chann, CHAN_RES_X2I); + mb_chann->i2x_head = mailbox_get_headptr(mb_chann, CHAN_RES_I2X); + + INIT_WORK(&mb_chann->rx_work, mailbox_rx_worker); + mb_chann->work_q = create_singlethread_workqueue(MAILBOX_NAME); + if (!mb_chann->work_q) { + MB_ERR(mb_chann, "Create workqueue failed"); + goto free_and_out; + } + + /* Everything look good. Time to enable irq handler */ + ret = request_irq(mb_irq, mailbox_irq_handler, 0, MAILBOX_NAME, mb_chann); + if (ret) { + MB_ERR(mb_chann, "Failed to request irq %d ret %d", mb_irq, ret); + goto destroy_wq; + } + + mb_chann->bad_state = false; + + MB_DBG(mb_chann, "Mailbox channel created (irq: %d)", mb_chann->msix_irq); + return mb_chann; + +destroy_wq: + destroy_workqueue(mb_chann->work_q); +free_and_out: + kfree(mb_chann); + return NULL; +} + +int xdna_mailbox_destroy_channel(struct mailbox_channel *mb_chann) +{ + if (!mb_chann) + return 0; + + MB_DBG(mb_chann, "IRQ disabled and RX work cancelled"); + free_irq(mb_chann->msix_irq, mb_chann); + destroy_workqueue(mb_chann->work_q); + /* We can clean up and release resources */ + + idr_for_each(&mb_chann->chan_idr, mailbox_release_msg, mb_chann); + idr_destroy(&mb_chann->chan_idr); + + MB_DBG(mb_chann, "Mailbox channel destroyed, irq: %d", mb_chann->msix_irq); + kfree(mb_chann); + return 0; +} + +void xdna_mailbox_stop_channel(struct mailbox_channel *mb_chann) +{ + if (!mb_chann) + return; + + /* Disable an irq and wait. This might sleep. */ + disable_irq(mb_chann->msix_irq); + + /* Cancel RX work and wait for it to finish */ + cancel_work_sync(&mb_chann->rx_work); + MB_DBG(mb_chann, "IRQ disabled and RX work cancelled"); +} + +struct mailbox *xdnam_mailbox_create(struct drm_device *ddev, + const struct xdna_mailbox_res *res) +{ + struct mailbox *mb; + + mb = drmm_kzalloc(ddev, sizeof(*mb), GFP_KERNEL); + if (!mb) + return NULL; + mb->dev = ddev->dev; + + /* mailbox and ring buf base and size information */ + memcpy(&mb->res, res, sizeof(*res)); + + return mb; +} diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.h b/drivers/accel/amdxdna/amdxdna_mailbox.h new file mode 100644 index 000000000000..6ab7f5424633 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox.h @@ -0,0 +1,124 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_MAILBOX_H_ +#define _AIE2_MAILBOX_H_ + +struct mailbox; +struct mailbox_channel; + +/* + * xdna_mailbox_msg - message struct + * + * @opcode: opcode for firmware + * @handle: handle used for the notify callback + * @notify_cb: callback function to notify the sender when there is response + * @send_data: pointing to sending data + * @send_size: size of the sending data + * + * The mailbox will split the sending data in to multiple firmware message if + * the size of the data is too big. This is transparent to the sender. The + * sender will receive one notification. + */ +struct xdna_mailbox_msg { + u32 opcode; + void *handle; + int (*notify_cb)(void *handle, const u32 *data, size_t size); + u8 *send_data; + size_t send_size; +}; + +/* + * xdna_mailbox_res - mailbox hardware resource + * + * @ringbuf_base: ring buffer base address + * @ringbuf_size: ring buffer size + * @mbox_base: mailbox base address + * @mbox_size: mailbox size + */ +struct xdna_mailbox_res { + u64 ringbuf_base; + size_t ringbuf_size; + u64 mbox_base; + size_t mbox_size; + const char *name; +}; + +/* + * xdna_mailbox_chann_res - resources + * + * @rb_start_addr: ring buffer start address + * @rb_size: ring buffer size + * @mb_head_ptr_reg: mailbox head pointer register + * @mb_tail_ptr_reg: mailbox tail pointer register + */ +struct xdna_mailbox_chann_res { + u32 rb_start_addr; + u32 rb_size; + u32 mb_head_ptr_reg; + u32 mb_tail_ptr_reg; +}; + +/* + * xdna_mailbox_create() -- create mailbox subsystem and initialize + * + * @ddev: device pointer + * @res: SRAM and mailbox resources + * + * Return: If success, return a handle of mailbox subsystem. + * Otherwise, return NULL pointer. + */ +struct mailbox *xdnam_mailbox_create(struct drm_device *ddev, + const struct xdna_mailbox_res *res); + +/* + * xdna_mailbox_create_channel() -- Create a mailbox channel instance + * + * @mailbox: the handle return from xdna_mailbox_create() + * @x2i: host to firmware mailbox resources + * @i2x: firmware to host mailbox resources + * @xdna_mailbox_intr_reg: register addr of MSI-X interrupt + * @mb_irq: Linux IRQ number associated with mailbox MSI-X interrupt vector index + * + * Return: If success, return a handle of mailbox channel. Otherwise, return NULL. + */ +struct mailbox_channel * +xdna_mailbox_create_channel(struct mailbox *mailbox, + const struct xdna_mailbox_chann_res *x2i, + const struct xdna_mailbox_chann_res *i2x, + u32 xdna_mailbox_intr_reg, + int mb_irq); + +/* + * xdna_mailbox_destroy_channel() -- destroy mailbox channel + * + * @mailbox_chann: the handle return from xdna_mailbox_create_channel() + * + * Return: if success, return 0. otherwise return error code + */ +int xdna_mailbox_destroy_channel(struct mailbox_channel *mailbox_chann); + +/* + * xdna_mailbox_stop_channel() -- stop mailbox channel + * + * @mailbox_chann: the handle return from xdna_mailbox_create_channel() + * + * Return: if success, return 0. otherwise return error code + */ +void xdna_mailbox_stop_channel(struct mailbox_channel *mailbox_chann); + +/* + * xdna_mailbox_send_msg() -- Send a message + * + * @mailbox_chann: Mailbox channel handle + * @msg: message struct for message information + * @tx_timeout: the timeout value for sending the message in ms. + * + * Return: If success return 0, otherwise, return error code + */ +int xdna_mailbox_send_msg(struct mailbox_channel *mailbox_chann, + const struct xdna_mailbox_msg *msg, u64 tx_timeout); + +#endif /* _AIE2_MAILBOX_ */ diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c new file mode 100644 index 000000000000..5139a9c96a91 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/drm_print.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/gpu_scheduler.h> +#include <linux/completion.h> + +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_mailbox_helper.h" +#include "amdxdna_pci_drv.h" + +int xdna_msg_cb(void *handle, const u32 *data, size_t size) +{ + struct xdna_notify *cb_arg = handle; + int ret; + + if (unlikely(!data)) + goto out; + + if (unlikely(cb_arg->size != size)) { + cb_arg->error = -EINVAL; + goto out; + } + + print_hex_dump_debug("resp data: ", DUMP_PREFIX_OFFSET, + 16, 4, data, cb_arg->size, true); + memcpy(cb_arg->data, data, cb_arg->size); +out: + ret = cb_arg->error; + complete(&cb_arg->comp); + return ret; +} + +int xdna_send_msg_wait(struct amdxdna_dev *xdna, struct mailbox_channel *chann, + struct xdna_mailbox_msg *msg) +{ + struct xdna_notify *hdl = msg->handle; + int ret; + + ret = xdna_mailbox_send_msg(chann, msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed, ret %d", ret); + return ret; + } + + ret = wait_for_completion_timeout(&hdl->comp, + msecs_to_jiffies(RX_TIMEOUT)); + if (!ret) { + XDNA_ERR(xdna, "Wait for completion timeout"); + return -ETIME; + } + + return hdl->error; +} diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.h b/drivers/accel/amdxdna/amdxdna_mailbox_helper.h new file mode 100644 index 000000000000..23e1317b79fe --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_MAILBOX_HELPER_H +#define _AMDXDNA_MAILBOX_HELPER_H + +#define TX_TIMEOUT 2000 /* milliseconds */ +#define RX_TIMEOUT 5000 /* milliseconds */ + +struct amdxdna_dev; + +struct xdna_notify { + struct completion comp; + u32 *data; + size_t size; + int error; +}; + +#define DECLARE_XDNA_MSG_COMMON(name, op, status) \ + struct name##_req req = { 0 }; \ + struct name##_resp resp = { status }; \ + struct xdna_notify hdl = { \ + .error = 0, \ + .data = (u32 *)&resp, \ + .size = sizeof(resp), \ + .comp = COMPLETION_INITIALIZER_ONSTACK(hdl.comp), \ + }; \ + struct xdna_mailbox_msg msg = { \ + .send_data = (u8 *)&req, \ + .send_size = sizeof(req), \ + .handle = &hdl, \ + .opcode = op, \ + .notify_cb = xdna_msg_cb, \ + } + +int xdna_msg_cb(void *handle, const u32 *data, size_t size); +int xdna_send_msg_wait(struct amdxdna_dev *xdna, struct mailbox_channel *chann, + struct xdna_mailbox_msg *msg); + +#endif /* _AMDXDNA_MAILBOX_HELPER_H */ diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdna/amdxdna_pci_drv.c new file mode 100644 index 000000000000..02533732d4ca --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -0,0 +1,409 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_accel.h> +#include <drm/drm_drv.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_ioctl.h> +#include <drm/drm_managed.h> +#include <drm/gpu_scheduler.h> +#include <linux/iommu.h> +#include <linux/pci.h> +#include <linux/pm_runtime.h> + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define AMDXDNA_AUTOSUSPEND_DELAY 5000 /* milliseconds */ + +/* + * Bind the driver base on (vendor_id, device_id) pair and later use the + * (device_id, rev_id) pair as a key to select the devices. The devices with + * same device_id have very similar interface to host driver. + */ +static const struct pci_device_id pci_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x1502) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x17f0) }, + {0} +}; + +MODULE_DEVICE_TABLE(pci, pci_ids); + +static const struct amdxdna_device_id amdxdna_ids[] = { + { 0x1502, 0x0, &dev_npu1_info }, + { 0x17f0, 0x0, &dev_npu2_info }, + { 0x17f0, 0x10, &dev_npu4_info }, + { 0x17f0, 0x11, &dev_npu5_info }, + {0} +}; + +static int amdxdna_drm_open(struct drm_device *ddev, struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(ddev); + struct amdxdna_client *client; + int ret; + + ret = pm_runtime_resume_and_get(ddev->dev); + if (ret) { + XDNA_ERR(xdna, "Failed to get rpm, ret %d", ret); + return ret; + } + + client = kzalloc(sizeof(*client), GFP_KERNEL); + if (!client) { + ret = -ENOMEM; + goto put_rpm; + } + + client->pid = pid_nr(filp->pid); + client->xdna = xdna; + + client->sva = iommu_sva_bind_device(xdna->ddev.dev, current->mm); + if (IS_ERR(client->sva)) { + ret = PTR_ERR(client->sva); + XDNA_ERR(xdna, "SVA bind device failed, ret %d", ret); + goto failed; + } + client->pasid = iommu_sva_get_pasid(client->sva); + if (client->pasid == IOMMU_PASID_INVALID) { + XDNA_ERR(xdna, "SVA get pasid failed"); + ret = -ENODEV; + goto unbind_sva; + } + mutex_init(&client->hwctx_lock); + init_srcu_struct(&client->hwctx_srcu); + idr_init_base(&client->hwctx_idr, AMDXDNA_INVALID_CTX_HANDLE + 1); + mutex_init(&client->mm_lock); + + mutex_lock(&xdna->dev_lock); + list_add_tail(&client->node, &xdna->client_list); + mutex_unlock(&xdna->dev_lock); + + filp->driver_priv = client; + client->filp = filp; + + XDNA_DBG(xdna, "pid %d opened", client->pid); + return 0; + +unbind_sva: + iommu_sva_unbind_device(client->sva); +failed: + kfree(client); +put_rpm: + pm_runtime_mark_last_busy(ddev->dev); + pm_runtime_put_autosuspend(ddev->dev); + + return ret; +} + +static void amdxdna_drm_close(struct drm_device *ddev, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(ddev); + + XDNA_DBG(xdna, "closing pid %d", client->pid); + + idr_destroy(&client->hwctx_idr); + cleanup_srcu_struct(&client->hwctx_srcu); + mutex_destroy(&client->hwctx_lock); + mutex_destroy(&client->mm_lock); + if (client->dev_heap) + drm_gem_object_put(to_gobj(client->dev_heap)); + + iommu_sva_unbind_device(client->sva); + + XDNA_DBG(xdna, "pid %d closed", client->pid); + kfree(client); + pm_runtime_mark_last_busy(ddev->dev); + pm_runtime_put_autosuspend(ddev->dev); +} + +static int amdxdna_flush(struct file *f, fl_owner_t id) +{ + struct drm_file *filp = f->private_data; + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = client->xdna; + int idx; + + XDNA_DBG(xdna, "PID %d flushing...", client->pid); + if (!drm_dev_enter(&xdna->ddev, &idx)) + return 0; + + mutex_lock(&xdna->dev_lock); + list_del_init(&client->node); + mutex_unlock(&xdna->dev_lock); + amdxdna_hwctx_remove_all(client); + + drm_dev_exit(idx); + return 0; +} + +static int amdxdna_drm_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_get_info *args = data; + int ret; + + if (!xdna->dev_info->ops->get_aie_info) + return -EOPNOTSUPP; + + XDNA_DBG(xdna, "Request parameter %u", args->param); + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->get_aie_info(client, args); + mutex_unlock(&xdna->dev_lock); + return ret; +} + +static const struct drm_ioctl_desc amdxdna_drm_ioctls[] = { + /* Context */ + DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_HWCTX, amdxdna_drm_create_hwctx_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_DESTROY_HWCTX, amdxdna_drm_destroy_hwctx_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_CONFIG_HWCTX, amdxdna_drm_config_hwctx_ioctl, 0), + /* BO */ + DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_BO, amdxdna_drm_create_bo_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_GET_BO_INFO, amdxdna_drm_get_bo_info_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_SYNC_BO, amdxdna_drm_sync_bo_ioctl, 0), + /* Execution */ + DRM_IOCTL_DEF_DRV(AMDXDNA_EXEC_CMD, amdxdna_drm_submit_cmd_ioctl, 0), + /* AIE hardware */ + DRM_IOCTL_DEF_DRV(AMDXDNA_GET_INFO, amdxdna_drm_get_info_ioctl, 0), +}; + +static const struct file_operations amdxdna_fops = { + .owner = THIS_MODULE, + .open = accel_open, + .release = drm_release, + .flush = amdxdna_flush, + .unlocked_ioctl = drm_ioctl, + .compat_ioctl = drm_compat_ioctl, + .poll = drm_poll, + .read = drm_read, + .llseek = noop_llseek, + .mmap = drm_gem_mmap, + .fop_flags = FOP_UNSIGNED_OFFSET, +}; + +const struct drm_driver amdxdna_drm_drv = { + .driver_features = DRIVER_GEM | DRIVER_COMPUTE_ACCEL | + DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE, + .fops = &amdxdna_fops, + .name = "amdxdna_accel_driver", + .desc = "AMD XDNA DRM implementation", + .open = amdxdna_drm_open, + .postclose = amdxdna_drm_close, + .ioctls = amdxdna_drm_ioctls, + .num_ioctls = ARRAY_SIZE(amdxdna_drm_ioctls), + + .gem_create_object = amdxdna_gem_create_object_cb, +}; + +static const struct amdxdna_dev_info * +amdxdna_get_dev_info(struct pci_dev *pdev) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(amdxdna_ids); i++) { + if (pdev->device == amdxdna_ids[i].device && + pdev->revision == amdxdna_ids[i].revision) + return amdxdna_ids[i].dev_info; + } + return NULL; +} + +static int amdxdna_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct device *dev = &pdev->dev; + struct amdxdna_dev *xdna; + int ret; + + xdna = devm_drm_dev_alloc(dev, &amdxdna_drm_drv, typeof(*xdna), ddev); + if (IS_ERR(xdna)) + return PTR_ERR(xdna); + + xdna->dev_info = amdxdna_get_dev_info(pdev); + if (!xdna->dev_info) + return -ENODEV; + + drmm_mutex_init(&xdna->ddev, &xdna->dev_lock); + init_rwsem(&xdna->notifier_lock); + INIT_LIST_HEAD(&xdna->client_list); + pci_set_drvdata(pdev, xdna); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + fs_reclaim_acquire(GFP_KERNEL); + might_lock(&xdna->notifier_lock); + fs_reclaim_release(GFP_KERNEL); + } + + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->init(xdna); + mutex_unlock(&xdna->dev_lock); + if (ret) { + XDNA_ERR(xdna, "Hardware init failed, ret %d", ret); + return ret; + } + + ret = amdxdna_sysfs_init(xdna); + if (ret) { + XDNA_ERR(xdna, "Create amdxdna attrs failed: %d", ret); + goto failed_dev_fini; + } + + pm_runtime_set_autosuspend_delay(dev, AMDXDNA_AUTOSUSPEND_DELAY); + pm_runtime_use_autosuspend(dev); + pm_runtime_allow(dev); + + ret = drm_dev_register(&xdna->ddev, 0); + if (ret) { + XDNA_ERR(xdna, "DRM register failed, ret %d", ret); + pm_runtime_forbid(dev); + goto failed_sysfs_fini; + } + + pm_runtime_mark_last_busy(dev); + pm_runtime_put_autosuspend(dev); + return 0; + +failed_sysfs_fini: + amdxdna_sysfs_fini(xdna); +failed_dev_fini: + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->fini(xdna); + mutex_unlock(&xdna->dev_lock); + return ret; +} + +static void amdxdna_remove(struct pci_dev *pdev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(pdev); + struct device *dev = &pdev->dev; + struct amdxdna_client *client; + + pm_runtime_get_noresume(dev); + pm_runtime_forbid(dev); + + drm_dev_unplug(&xdna->ddev); + amdxdna_sysfs_fini(xdna); + + mutex_lock(&xdna->dev_lock); + client = list_first_entry_or_null(&xdna->client_list, + struct amdxdna_client, node); + while (client) { + list_del_init(&client->node); + mutex_unlock(&xdna->dev_lock); + + amdxdna_hwctx_remove_all(client); + + mutex_lock(&xdna->dev_lock); + client = list_first_entry_or_null(&xdna->client_list, + struct amdxdna_client, node); + } + + xdna->dev_info->ops->fini(xdna); + mutex_unlock(&xdna->dev_lock); +} + +static int amdxdna_dev_suspend_nolock(struct amdxdna_dev *xdna) +{ + if (xdna->dev_info->ops->suspend) + xdna->dev_info->ops->suspend(xdna); + + return 0; +} + +static int amdxdna_dev_resume_nolock(struct amdxdna_dev *xdna) +{ + if (xdna->dev_info->ops->resume) + return xdna->dev_info->ops->resume(xdna); + + return 0; +} + +static int amdxdna_pmops_suspend(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + struct amdxdna_client *client; + + mutex_lock(&xdna->dev_lock); + list_for_each_entry(client, &xdna->client_list, node) + amdxdna_hwctx_suspend(client); + + amdxdna_dev_suspend_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + return 0; +} + +static int amdxdna_pmops_resume(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + struct amdxdna_client *client; + int ret; + + XDNA_INFO(xdna, "firmware resuming..."); + mutex_lock(&xdna->dev_lock); + ret = amdxdna_dev_resume_nolock(xdna); + if (ret) { + XDNA_ERR(xdna, "resume NPU firmware failed"); + mutex_unlock(&xdna->dev_lock); + return ret; + } + + XDNA_INFO(xdna, "hardware context resuming..."); + list_for_each_entry(client, &xdna->client_list, node) + amdxdna_hwctx_resume(client); + mutex_unlock(&xdna->dev_lock); + + return 0; +} + +static int amdxdna_rpmops_suspend(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + int ret; + + mutex_lock(&xdna->dev_lock); + ret = amdxdna_dev_suspend_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "Runtime suspend done ret: %d", ret); + return ret; +} + +static int amdxdna_rpmops_resume(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + int ret; + + mutex_lock(&xdna->dev_lock); + ret = amdxdna_dev_resume_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "Runtime resume done ret: %d", ret); + return ret; +} + +static const struct dev_pm_ops amdxdna_pm_ops = { + SET_SYSTEM_SLEEP_PM_OPS(amdxdna_pmops_suspend, amdxdna_pmops_resume) + SET_RUNTIME_PM_OPS(amdxdna_rpmops_suspend, amdxdna_rpmops_resume, NULL) +}; + +static struct pci_driver amdxdna_pci_driver = { + .name = KBUILD_MODNAME, + .id_table = pci_ids, + .probe = amdxdna_probe, + .remove = amdxdna_remove, + .driver.pm = &amdxdna_pm_ops, +}; + +module_pci_driver(amdxdna_pci_driver); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("XRT Team <runtimeca39d@amd.com>"); +MODULE_DESCRIPTION("amdxdna driver"); diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdna/amdxdna_pci_drv.h new file mode 100644 index 000000000000..c50d65a050ad --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_PCI_DRV_H_ +#define _AMDXDNA_PCI_DRV_H_ + +#define XDNA_INFO(xdna, fmt, args...) drm_info(&(xdna)->ddev, fmt, ##args) +#define XDNA_WARN(xdna, fmt, args...) drm_warn(&(xdna)->ddev, "%s: "fmt, __func__, ##args) +#define XDNA_ERR(xdna, fmt, args...) drm_err(&(xdna)->ddev, "%s: "fmt, __func__, ##args) +#define XDNA_DBG(xdna, fmt, args...) drm_dbg(&(xdna)->ddev, fmt, ##args) +#define XDNA_INFO_ONCE(xdna, fmt, args...) drm_info_once(&(xdna)->ddev, fmt, ##args) + +#define to_xdna_dev(drm_dev) \ + ((struct amdxdna_dev *)container_of(drm_dev, struct amdxdna_dev, ddev)) + +extern const struct drm_driver amdxdna_drm_drv; + +struct amdxdna_client; +struct amdxdna_dev; +struct amdxdna_drm_get_info; +struct amdxdna_gem_obj; +struct amdxdna_hwctx; +struct amdxdna_sched_job; + +/* + * struct amdxdna_dev_ops - Device hardware operation callbacks + */ +struct amdxdna_dev_ops { + int (*init)(struct amdxdna_dev *xdna); + void (*fini)(struct amdxdna_dev *xdna); + int (*resume)(struct amdxdna_dev *xdna); + void (*suspend)(struct amdxdna_dev *xdna); + int (*hwctx_init)(struct amdxdna_hwctx *hwctx); + void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); + int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, void *buf, u32 size); + void (*hmm_invalidate)(struct amdxdna_gem_obj *abo, unsigned long cur_seq); + void (*hwctx_suspend)(struct amdxdna_hwctx *hwctx); + void (*hwctx_resume)(struct amdxdna_hwctx *hwctx); + int (*cmd_submit)(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, u64 *seq); + int (*get_aie_info)(struct amdxdna_client *client, struct amdxdna_drm_get_info *args); +}; + +/* + * struct amdxdna_dev_info - Device hardware information + * Record device static information, like reg, mbox, PSP, SMU bar index + */ +struct amdxdna_dev_info { + int reg_bar; + int mbox_bar; + int sram_bar; + int psp_bar; + int smu_bar; + int device_type; + int first_col; + u32 dev_mem_buf_shift; + u64 dev_mem_base; + size_t dev_mem_size; + char *vbnv; + const struct amdxdna_dev_priv *dev_priv; + const struct amdxdna_dev_ops *ops; +}; + +struct amdxdna_fw_ver { + u32 major; + u32 minor; + u32 sub; + u32 build; +}; + +struct amdxdna_dev { + struct drm_device ddev; + struct amdxdna_dev_hdl *dev_handle; + const struct amdxdna_dev_info *dev_info; + void *xrs_hdl; + + struct mutex dev_lock; /* per device lock */ + struct list_head client_list; + struct amdxdna_fw_ver fw_ver; + struct rw_semaphore notifier_lock; /* for mmu notifier*/ +}; + +/* + * struct amdxdna_device_id - PCI device info + */ +struct amdxdna_device_id { + unsigned short device; + u8 revision; + const struct amdxdna_dev_info *dev_info; +}; + +/* + * struct amdxdna_client - amdxdna client + * A per fd data structure for managing context and other user process stuffs. + */ +struct amdxdna_client { + struct list_head node; + pid_t pid; + struct mutex hwctx_lock; /* protect hwctx */ + /* do NOT wait this srcu when hwctx_lock is held */ + struct srcu_struct hwctx_srcu; + struct idr hwctx_idr; + struct amdxdna_dev *xdna; + struct drm_file *filp; + + struct mutex mm_lock; /* protect memory related */ + struct amdxdna_gem_obj *dev_heap; + + struct iommu_sva *sva; + int pasid; +}; + +/* Add device info below */ +extern const struct amdxdna_dev_info dev_npu1_info; +extern const struct amdxdna_dev_info dev_npu2_info; +extern const struct amdxdna_dev_info dev_npu4_info; +extern const struct amdxdna_dev_info dev_npu5_info; + +int amdxdna_sysfs_init(struct amdxdna_dev *xdna); +void amdxdna_sysfs_fini(struct amdxdna_dev *xdna); + +#endif /* _AMDXDNA_PCI_DRV_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_sysfs.c b/drivers/accel/amdxdna/amdxdna_sysfs.c new file mode 100644 index 000000000000..f27e4ee960a0 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_sysfs.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/drm_print.h> +#include <drm/gpu_scheduler.h> +#include <linux/types.h> + +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +static ssize_t vbnv_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct amdxdna_dev *xdna = dev_get_drvdata(dev); + + return sprintf(buf, "%s\n", xdna->dev_info->vbnv); +} +static DEVICE_ATTR_RO(vbnv); + +static ssize_t device_type_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct amdxdna_dev *xdna = dev_get_drvdata(dev); + + return sprintf(buf, "%d\n", xdna->dev_info->device_type); +} +static DEVICE_ATTR_RO(device_type); + +static ssize_t fw_version_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct amdxdna_dev *xdna = dev_get_drvdata(dev); + + return sprintf(buf, "%d.%d.%d.%d\n", xdna->fw_ver.major, + xdna->fw_ver.minor, xdna->fw_ver.sub, + xdna->fw_ver.build); +} +static DEVICE_ATTR_RO(fw_version); + +static struct attribute *amdxdna_attrs[] = { + &dev_attr_device_type.attr, + &dev_attr_vbnv.attr, + &dev_attr_fw_version.attr, + NULL, +}; + +static struct attribute_group amdxdna_attr_group = { + .attrs = amdxdna_attrs, +}; + +int amdxdna_sysfs_init(struct amdxdna_dev *xdna) +{ + int ret; + + ret = sysfs_create_group(&xdna->ddev.dev->kobj, &amdxdna_attr_group); + if (ret) + XDNA_ERR(xdna, "Create attr group failed"); + + return ret; +} + +void amdxdna_sysfs_fini(struct amdxdna_dev *xdna) +{ + sysfs_remove_group(&xdna->ddev.dev->kobj, &amdxdna_attr_group); +} diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1_regs.c new file mode 100644 index 000000000000..f00c50461b09 --- /dev/null +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/gpu_scheduler.h> +#include <linux/sizes.h> + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* Address definition from NPU1 docs */ +#define MPNPU_PUB_SEC_INTR 0x3010090 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010094 +#define MPNPU_PUB_SCRATCH2 0x30100A0 +#define MPNPU_PUB_SCRATCH3 0x30100A4 +#define MPNPU_PUB_SCRATCH4 0x30100A8 +#define MPNPU_PUB_SCRATCH5 0x30100AC +#define MPNPU_PUB_SCRATCH6 0x30100B0 +#define MPNPU_PUB_SCRATCH7 0x30100B4 +#define MPNPU_PUB_SCRATCH9 0x30100BC + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x30A0000 +#define MPNPU_SRAM_X2I_MAILBOX_1 0x30A2000 +#define MPNPU_SRAM_I2X_MAILBOX_15 0x30BF000 + +#define MPNPU_APERTURE0_BASE 0x3000000 +#define MPNPU_APERTURE1_BASE 0x3080000 +#define MPNPU_APERTURE2_BASE 0x30C0000 + +/* PCIe BAR Index for NPU1 */ +#define NPU1_REG_BAR_INDEX 0 +#define NPU1_MBOX_BAR_INDEX 4 +#define NPU1_PSP_BAR_INDEX 0 +#define NPU1_SMU_BAR_INDEX 0 +#define NPU1_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU1_REG_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_MBOX_BAR_BASE MPNPU_APERTURE2_BASE +#define NPU1_PSP_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_SMU_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_SRAM_BAR_BASE MPNPU_APERTURE1_BASE + +#define NPU1_RT_CFG_TYPE_PDI_LOAD 2 +#define NPU1_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU1_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU1_MPNPUCLK_FREQ_MAX 600 +#define NPU1_HCLK_FREQ_MAX 1024 + +const struct amdxdna_dev_priv npu1_dev_priv = { + .fw_path = "amdnpu/1502_00/npu.sbin", + .protocol_major = 0x5, + .protocol_minor = 0x1, + .rt_config = {NPU1_RT_CFG_TYPE_PDI_LOAD, NPU1_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align = COL_ALIGN_NONE, + .mbox_dev_addr = NPU1_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU1_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU1_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU1_SRAM, MPNPU_SRAM_I2X_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU1_PSP, MPNPU_PUB_SCRATCH2), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU1_PSP, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU1_PSP, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU1_PSP, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU1_PSP, MPNPU_PUB_SEC_INTR), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU1_PSP, MPNPU_PUB_SCRATCH2), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU1_PSP, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU1_SMU, MPNPU_PUB_SCRATCH5), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU1_SMU, MPNPU_PUB_SCRATCH7), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU1_SMU, MPNPU_PUB_PWRMGMT_INTR), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU1_SMU, MPNPU_PUB_SCRATCH6), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU1_SMU, MPNPU_PUB_SCRATCH7), + }, + .smu_mpnpuclk_freq_max = NPU1_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max = NPU1_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu1_info = { + .reg_bar = NPU1_REG_BAR_INDEX, + .mbox_bar = NPU1_MBOX_BAR_INDEX, + .sram_bar = NPU1_SRAM_BAR_INDEX, + .psp_bar = NPU1_PSP_BAR_INDEX, + .smu_bar = NPU1_SMU_BAR_INDEX, + .first_col = 1, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu1", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu1_dev_priv, + .ops = &aie2_ops, +}; diff --git a/drivers/accel/amdxdna/npu2_regs.c b/drivers/accel/amdxdna/npu2_regs.c new file mode 100644 index 000000000000..00cb381031d2 --- /dev/null +++ b/drivers/accel/amdxdna/npu2_regs.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/gpu_scheduler.h> +#include <linux/sizes.h> + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU2 */ +#define NPU2_REG_BAR_INDEX 0 +#define NPU2_MBOX_BAR_INDEX 0 +#define NPU2_PSP_BAR_INDEX 4 +#define NPU2_SMU_BAR_INDEX 5 +#define NPU2_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU2_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU2_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU2_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU2_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU2_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +#define NPU2_RT_CFG_TYPE_PDI_LOAD 5 +#define NPU2_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU2_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU2_MPNPUCLK_FREQ_MAX 1267 +#define NPU2_HCLK_FREQ_MAX 1800 + +const struct amdxdna_dev_priv npu2_dev_priv = { + .fw_path = "amdnpu/17f0_00/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 0x1, + .rt_config = {NPU2_RT_CFG_TYPE_PDI_LOAD, NPU2_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU2_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU2_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU2_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU2_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU2_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU2_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU2_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU2_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU2_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU2_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU2_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU2_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU2_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU2_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU2_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU2_SMU, MP1_C2PMSG_60), + }, + .smu_mpnpuclk_freq_max = NPU2_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max = NPU2_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu2_info = { + .reg_bar = NPU2_REG_BAR_INDEX, + .mbox_bar = NPU2_MBOX_BAR_INDEX, + .sram_bar = NPU2_SRAM_BAR_INDEX, + .psp_bar = NPU2_PSP_BAR_INDEX, + .smu_bar = NPU2_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu2", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu2_dev_priv, + .ops = &aie2_ops, /* NPU2 can share NPU1's callback */ +}; diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4_regs.c new file mode 100644 index 000000000000..b6dae9667cca --- /dev/null +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/gpu_scheduler.h> +#include <linux/sizes.h> + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU4 */ +#define NPU4_REG_BAR_INDEX 0 +#define NPU4_MBOX_BAR_INDEX 0 +#define NPU4_PSP_BAR_INDEX 4 +#define NPU4_SMU_BAR_INDEX 5 +#define NPU4_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU4_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU4_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU4_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU4_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU4_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +#define NPU4_RT_CFG_TYPE_PDI_LOAD 5 +#define NPU4_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU4_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU4_MPNPUCLK_FREQ_MAX 1267 +#define NPU4_HCLK_FREQ_MAX 1800 + +const struct amdxdna_dev_priv npu4_dev_priv = { + .fw_path = "amdnpu/17f0_10/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 0x1, + .rt_config = {NPU4_RT_CFG_TYPE_PDI_LOAD, NPU4_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU4_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU4_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU4_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU4_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU4_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU4_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU4_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU4_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU4_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU4_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU4_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU4_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU4_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU4_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU4_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU4_SMU, MP1_C2PMSG_60), + }, + .smu_mpnpuclk_freq_max = NPU4_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max = NPU4_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu4_info = { + .reg_bar = NPU4_REG_BAR_INDEX, + .mbox_bar = NPU4_MBOX_BAR_INDEX, + .sram_bar = NPU4_SRAM_BAR_INDEX, + .psp_bar = NPU4_PSP_BAR_INDEX, + .smu_bar = NPU4_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu4", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu4_dev_priv, + .ops = &aie2_ops, /* NPU4 can share NPU1's callback */ +}; diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5_regs.c new file mode 100644 index 000000000000..bed1baf8e160 --- /dev/null +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include <drm/amdxdna_accel.h> +#include <drm/drm_device.h> +#include <drm/gpu_scheduler.h> +#include <linux/sizes.h> + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU5 */ +#define NPU5_REG_BAR_INDEX 0 +#define NPU5_MBOX_BAR_INDEX 0 +#define NPU5_PSP_BAR_INDEX 4 +#define NPU5_SMU_BAR_INDEX 5 +#define NPU5_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU5_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU5_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU5_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU5_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU5_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +#define NPU5_RT_CFG_TYPE_PDI_LOAD 5 +#define NPU5_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU5_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU5_MPNPUCLK_FREQ_MAX 1267 +#define NPU5_HCLK_FREQ_MAX 1800 + +const struct amdxdna_dev_priv npu5_dev_priv = { + .fw_path = "amdnpu/17f0_11/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 0x1, + .rt_config = {NPU5_RT_CFG_TYPE_PDI_LOAD, NPU5_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU5_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU5_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU5_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU5_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU5_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU5_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU5_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU5_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU5_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU5_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU5_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU5_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU5_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU5_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU5_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU5_SMU, MP1_C2PMSG_60), + }, + .smu_mpnpuclk_freq_max = NPU5_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max = NPU5_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu5_info = { + .reg_bar = NPU5_REG_BAR_INDEX, + .mbox_bar = NPU5_MBOX_BAR_INDEX, + .sram_bar = NPU5_SRAM_BAR_INDEX, + .psp_bar = NPU5_PSP_BAR_INDEX, + .smu_bar = NPU5_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu5", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu5_dev_priv, + .ops = &aie2_ops, +}; diff --git a/drivers/accel/habanalabs/common/habanalabs_drv.c b/drivers/accel/habanalabs/common/habanalabs_drv.c index 708dfd10f39c..5409b2c656c8 100644 --- a/drivers/accel/habanalabs/common/habanalabs_drv.c +++ b/drivers/accel/habanalabs/common/habanalabs_drv.c @@ -101,7 +101,6 @@ static const struct drm_driver hl_driver = { .major = LINUX_VERSION_MAJOR, .minor = LINUX_VERSION_PATCHLEVEL, .patchlevel = LINUX_VERSION_SUBLEVEL, - .date = "20190505", .fops = &hl_fops, .open = hl_device_open, diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c index ca2bf47ce248..1e8ffbe25eee 100644 --- a/drivers/accel/ivpu/ivpu_drv.c +++ b/drivers/accel/ivpu/ivpu_drv.c @@ -458,15 +458,7 @@ static const struct drm_driver driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, -#ifdef DRIVER_DATE - .date = DRIVER_DATE, - .major = DRIVER_MAJOR, - .minor = DRIVER_MINOR, - .patchlevel = DRIVER_PATCHLEVEL, -#else - .date = UTS_RELEASE, .major = 1, -#endif }; static void ivpu_context_abort_invalid(struct ivpu_device *vdev) diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index dbc0711e28d1..6821051dfa3a 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -78,8 +78,8 @@ static int ivpu_resume(struct ivpu_device *vdev) int ret; retry: - pci_restore_state(to_pci_dev(vdev->drm.dev)); pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0); + pci_restore_state(to_pci_dev(vdev->drm.dev)); ret = ivpu_hw_power_up(vdev); if (ret) { diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c index 4ddf89308ff5..81819b9ef8d4 100644 --- a/drivers/accel/qaic/qaic_drv.c +++ b/drivers/accel/qaic/qaic_drv.c @@ -208,7 +208,6 @@ static const struct drm_driver qaic_accel_driver = { .name = QAIC_NAME, .desc = QAIC_DESC, - .date = "20190618", .fops = &qaic_accel_fops, .open = qaic_open, diff --git a/drivers/accel/qaic/sahara.c b/drivers/accel/qaic/sahara.c index 6d772143d612..21d58aed0deb 100644 --- a/drivers/accel/qaic/sahara.c +++ b/drivers/accel/qaic/sahara.c @@ -772,8 +772,7 @@ static void sahara_mhi_remove(struct mhi_device *mhi_dev) cancel_work_sync(&context->fw_work); cancel_work_sync(&context->dump_work); - if (context->mem_dump) - vfree(context->mem_dump); + vfree(context->mem_dump); sahara_release_image(context); mhi_unprepare_from_transfer(mhi_dev); } diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 5504721007cc..698981a2ed51 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -217,77 +217,7 @@ config DRM_CLIENT option. Drivers that support the default clients should select DRM_CLIENT_SELECTION instead. -config DRM_CLIENT_LIB - tristate - depends on DRM - select DRM_KMS_HELPER if DRM_FBDEV_EMULATION - select FB_CORE if DRM_FBDEV_EMULATION - help - This option enables the DRM client library and selects all - modules and components according to the enabled clients. - -config DRM_CLIENT_SELECTION - tristate - depends on DRM - select DRM_CLIENT_LIB if DRM_FBDEV_EMULATION - help - Drivers that support in-kernel DRM clients have to select this - option. - -config DRM_CLIENT_SETUP - bool - depends on DRM_CLIENT_SELECTION - help - Enables the DRM client selection. DRM drivers that support the - default clients should select DRM_CLIENT_SELECTION instead. - -menu "Supported DRM clients" - depends on DRM_CLIENT_SELECTION - -config DRM_FBDEV_EMULATION - bool "Enable legacy fbdev support for your modesetting driver" - depends on DRM_CLIENT_SELECTION - select DRM_CLIENT - select DRM_CLIENT_SETUP - select FRAMEBUFFER_CONSOLE_DETECT_PRIMARY if FRAMEBUFFER_CONSOLE - default FB - help - Choose this option if you have a need for the legacy fbdev - support. Note that this support also provides the linux console - support on top of your modesetting driver. - - If in doubt, say "Y". - -config DRM_FBDEV_OVERALLOC - int "Overallocation of the fbdev buffer" - depends on DRM_FBDEV_EMULATION - default 100 - help - Defines the fbdev buffer overallocation in percent. Default - is 100. Typical values for double buffering will be 200, - triple buffering 300. - -config DRM_FBDEV_LEAK_PHYS_SMEM - bool "Shamelessly allow leaking of fbdev physical address (DANGEROUS)" - depends on DRM_FBDEV_EMULATION && EXPERT - default n - help - In order to keep user-space compatibility, we want in certain - use-cases to keep leaking the fbdev physical address to the - user-space program handling the fbdev buffer. - This affects, not only, Amlogic, Allwinner or Rockchip devices - with ARM Mali GPUs using an userspace Blob. - This option is not supported by upstream developers and should be - removed as soon as possible and be considered as a broken and - legacy behaviour from a modern fbdev device driver. - - Please send any bug reports when using this to your proprietary - software vendor that requires this. - - If in doubt, say "N" or spread the word to your closed source - library vendor. - -endmenu +source "drivers/gpu/drm/clients/Kconfig" config DRM_LOAD_EDID_FIRMWARE bool "Allow to specify an EDID data set instead of probing for it" @@ -526,6 +456,10 @@ config DRM_HYPERV config DRM_EXPORT_FOR_TESTS bool +# Separate option as not all DRM drivers use it +config DRM_PANEL_BACKLIGHT_QUIRKS + tristate + config DRM_LIB_RANDOM bool default n diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 463afad1b5ca..1677c1f335fb 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -95,6 +95,7 @@ drm-$(CONFIG_DRM_PANIC_SCREEN_QR_CODE) += drm_panic_qr.o obj-$(CONFIG_DRM) += drm.o obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o +obj-$(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS) += drm_panel_backlight_quirks.o # # Memory-management helpers @@ -149,14 +150,6 @@ drm_kms_helper-$(CONFIG_DRM_FBDEV_EMULATION) += drm_fb_helper.o obj-$(CONFIG_DRM_KMS_HELPER) += drm_kms_helper.o # -# DRM clients -# - -drm_client_lib-y := drm_client_setup.o -drm_client_lib-$(CONFIG_DRM_FBDEV_EMULATION) += drm_fbdev_client.o -obj-$(CONFIG_DRM_CLIENT_LIB) += drm_client_lib.o - -# # Drivers and the rest # @@ -165,6 +158,7 @@ obj-y += tests/ obj-$(CONFIG_DRM_MIPI_DBI) += drm_mipi_dbi.o obj-$(CONFIG_DRM_MIPI_DSI) += drm_mipi_dsi.o obj-y += arm/ +obj-y += clients/ obj-y += display/ obj-$(CONFIG_DRM_TTM) += ttm/ obj-$(CONFIG_DRM_SCHED) += scheduler/ diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 41fa3377d9cf..1a11cab741ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,6 +26,7 @@ config DRM_AMDGPU select DRM_BUDDY select DRM_SUBALLOC_HELPER select DRM_EXEC + select DRM_PANEL_BACKLIGHT_QUIRKS # amdgpu depends on ACPI_VIDEO when ACPI is enabled, for select to work # ACPI_VIDEO's dependencies must also be selected. select INPUT if ACPI diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 38686203bea6..eaeaaddb32cd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -23,7 +23,7 @@ */ #include <drm/amdgpu_drm.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_gem.h> @@ -2916,7 +2916,6 @@ static const struct drm_driver amdgpu_kms_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = KMS_DRIVER_MAJOR, .minor = KMS_DRIVER_MINOR, .patchlevel = KMS_DRIVER_PATCHLEVEL, @@ -2940,7 +2939,6 @@ const struct drm_driver amdgpu_partition_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = KMS_DRIVER_MAJOR, .minor = KMS_DRIVER_MINOR, .patchlevel = KMS_DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h index 5bc2cb661af7..2d86cc6f7f4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h @@ -40,7 +40,6 @@ #define DRIVER_NAME "amdgpu" #define DRIVER_DESC "AMD GPU" -#define DRIVER_DATE "20150101" extern const struct drm_driver amdgpu_partition_driver; diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 48be917e7bc5..243cee284131 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -93,6 +93,7 @@ #include <drm/drm_fourcc.h> #include <drm/drm_edid.h> #include <drm/drm_eld.h> +#include <drm/drm_utils.h> #include <drm/drm_vblank.h> #include <drm/drm_audio_component.h> #include <drm/drm_gem_atomic_helper.h> @@ -3457,6 +3458,7 @@ static void update_connector_ext_caps(struct amdgpu_dm_connector *aconnector) struct drm_connector *conn_base; struct amdgpu_device *adev; struct drm_luminance_range_info *luminance_range; + int min_input_signal_override; if (aconnector->bl_idx == -1 || aconnector->dc_link->connector_signal != SIGNAL_TYPE_EDP) @@ -3493,6 +3495,10 @@ static void update_connector_ext_caps(struct amdgpu_dm_connector *aconnector) caps->aux_min_input_signal = 0; caps->aux_max_input_signal = 512; } + + min_input_signal_override = drm_get_panel_min_brightness_quirk(aconnector->drm_edid); + if (min_input_signal_override >= 0) + caps->min_input_signal = min_input_signal_override; } void amdgpu_dm_update_connector_after_detect( diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_drv.c b/drivers/gpu/drm/arm/display/komeda/komeda_drv.c index d981d721e796..358c1512b087 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_drv.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_drv.c @@ -9,7 +9,7 @@ #include <linux/of.h> #include <linux/platform_device.h> #include <linux/pm_runtime.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_module.h> #include <drm/drm_of.h> #include "komeda_dev.h" diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c index 1e7b1fcb2848..6ed504099188 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c @@ -63,7 +63,6 @@ static const struct drm_driver komeda_kms_driver = { .fops = &komeda_cma_fops, .name = "komeda", .desc = "Arm Komeda Display Processor driver", - .date = "20181101", .major = 0, .minor = 1, }; diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c index 191b806624df..c3179d74f3f5 100644 --- a/drivers/gpu/drm/arm/hdlcd_drv.c +++ b/drivers/gpu/drm/arm/hdlcd_drv.c @@ -22,8 +22,8 @@ #include <linux/platform_device.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_crtc.h> #include <drm/drm_debugfs.h> #include <drm/drm_drv.h> @@ -233,7 +233,6 @@ static const struct drm_driver hdlcd_driver = { .fops = &fops, .name = "hdlcd", .desc = "ARM HDLCD Controller DRM", - .date = "20151021", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/arm/malidp_drv.c b/drivers/gpu/drm/arm/malidp_drv.c index fd2be80f3bf5..e083021e9e99 100644 --- a/drivers/gpu/drm/arm/malidp_drv.c +++ b/drivers/gpu/drm/arm/malidp_drv.c @@ -16,9 +16,9 @@ #include <linux/pm_runtime.h> #include <linux/debugfs.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_crtc.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> @@ -570,7 +570,6 @@ static const struct drm_driver malidp_driver = { .fops = &fops, .name = "mali-dp", .desc = "ARM Mali Display Processor driver", - .date = "20160106", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c index 650e450cc19b..cae25ad66c74 100644 --- a/drivers/gpu/drm/armada/armada_drv.c +++ b/drivers/gpu/drm/armada/armada_drv.c @@ -11,8 +11,8 @@ #include <linux/of_graph.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_ioctl.h> #include <drm/drm_managed.h> @@ -45,7 +45,6 @@ static const struct drm_driver armada_drm_driver = { .minor = 0, .name = "armada-drm", .desc = "Armada SoC DRM", - .date = "20120730", .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC, .ioctls = armada_ioctls, .num_ioctls = ARRAY_SIZE(armada_ioctls), diff --git a/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c b/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c index b7e608ba6194..397e677a691c 100644 --- a/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c +++ b/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c @@ -13,8 +13,8 @@ #include <linux/regmap.h> #include <linux/reset.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_device.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -252,7 +252,6 @@ static const struct drm_driver aspeed_gfx_driver = { .fops = &fops, .name = "aspeed-gfx-drm", .desc = "ASPEED GFX DRM", - .date = "20180319", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/ast/ast_drv.c b/drivers/gpu/drm/ast/ast_drv.c index 4afe4be072ef..ff3bcdd1cff2 100644 --- a/drivers/gpu/drm/ast/ast_drv.c +++ b/drivers/gpu/drm/ast/ast_drv.c @@ -31,8 +31,8 @@ #include <linux/of.h> #include <linux/pci.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_gem_shmem_helper.h> @@ -60,7 +60,6 @@ static const struct drm_driver ast_driver = { .fops = &ast_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h index 21ce3769bf0d..6b4305ac07d4 100644 --- a/drivers/gpu/drm/ast/ast_drv.h +++ b/drivers/gpu/drm/ast/ast_drv.h @@ -43,7 +43,6 @@ #define DRIVER_NAME "ast" #define DRIVER_DESC "AST" -#define DRIVER_DATE "20120228" #define DRIVER_MAJOR 0 #define DRIVER_MINOR 1 diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c index 7b209af7cf45..fa8ad94e431a 100644 --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c @@ -16,9 +16,9 @@ #include <linux/pm_runtime.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -846,7 +846,6 @@ static const struct drm_driver atmel_hlcdc_dc_driver = { .fops = &fops, .name = "atmel-hlcdc", .desc = "Atmel HLCD Controller DRM", - .date = "20141504", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c index 31832ba4017f..42248f179b69 100644 --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c @@ -500,34 +500,6 @@ static void cdns_mhdp_hdcp_prop_work(struct work_struct *work) drm_modeset_unlock(&dev->mode_config.connection_mutex); } -int cdns_mhdp_hdcp_set_lc(struct cdns_mhdp_device *mhdp, u8 *val) -{ - int ret; - - mutex_lock(&mhdp->mbox_mutex); - ret = cdns_mhdp_secure_mailbox_send(mhdp, MB_MODULE_ID_HDCP_GENERAL, - HDCP_GENERAL_SET_LC_128, - 16, val); - mutex_unlock(&mhdp->mbox_mutex); - - return ret; -} - -int -cdns_mhdp_hdcp_set_public_key_param(struct cdns_mhdp_device *mhdp, - struct cdns_hdcp_tx_public_key_param *val) -{ - int ret; - - mutex_lock(&mhdp->mbox_mutex); - ret = cdns_mhdp_secure_mailbox_send(mhdp, MB_MODULE_ID_HDCP_TX, - HDCP2X_TX_SET_PUBLIC_KEY_PARAMS, - sizeof(*val), (u8 *)val); - mutex_unlock(&mhdp->mbox_mutex); - - return ret; -} - int cdns_mhdp_hdcp_enable(struct cdns_mhdp_device *mhdp, u8 content_type) { int ret; diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h index 334c0b8b0d4f..3b6ec9c3a8d8 100644 --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h @@ -82,9 +82,6 @@ struct cdns_hdcp_tx_public_key_param { u8 E[DLP_E]; }; -int cdns_mhdp_hdcp_set_public_key_param(struct cdns_mhdp_device *mhdp, - struct cdns_hdcp_tx_public_key_param *val); -int cdns_mhdp_hdcp_set_lc(struct cdns_mhdp_device *mhdp, u8 *val); int cdns_mhdp_hdcp_enable(struct cdns_mhdp_device *mhdp, u8 content_type); int cdns_mhdp_hdcp_disable(struct cdns_mhdp_device *mhdp); void cdns_mhdp_hdcp_init(struct cdns_mhdp_device *mhdp); diff --git a/drivers/gpu/drm/bridge/chipone-icn6211.c b/drivers/gpu/drm/bridge/chipone-icn6211.c index 9eecac457dcf..d47703559b0d 100644 --- a/drivers/gpu/drm/bridge/chipone-icn6211.c +++ b/drivers/gpu/drm/bridge/chipone-icn6211.c @@ -785,7 +785,7 @@ static struct mipi_dsi_driver chipone_dsi_driver = { }, }; -static struct i2c_device_id chipone_i2c_id[] = { +static const struct i2c_device_id chipone_i2c_id[] = { { "chipone,icn6211" }, {}, }; diff --git a/drivers/gpu/drm/bridge/lontium-lt9211.c b/drivers/gpu/drm/bridge/lontium-lt9211.c index c8881796fba4..999ddebb832d 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9211.c +++ b/drivers/gpu/drm/bridge/lontium-lt9211.c @@ -773,7 +773,7 @@ static void lt9211_remove(struct i2c_client *client) drm_bridge_remove(&ctx->bridge); } -static struct i2c_device_id lt9211_id[] = { +static const struct i2c_device_id lt9211_id[] = { { "lontium,lt9211" }, {}, }; diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c b/drivers/gpu/drm/bridge/lontium-lt9611.c index 1b31fdebe164..8f25b338a8d8 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611.c @@ -1235,7 +1235,7 @@ static void lt9611_remove(struct i2c_client *client) of_node_put(lt9611->dsi0_node); } -static struct i2c_device_id lt9611_id[] = { +static const struct i2c_device_id lt9611_id[] = { { "lontium,lt9611", 0 }, {} }; diff --git a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c index 4d1d40e1f1b4..f89af8203c9d 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c @@ -913,7 +913,7 @@ static void lt9611uxc_remove(struct i2c_client *client) of_node_put(lt9611uxc->dsi0_node); } -static struct i2c_device_id lt9611uxc_id[] = { +static const struct i2c_device_id lt9611uxc_id[] = { { "lontium,lt9611uxc", 0 }, { /* sentinel */ } }; diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c b/drivers/gpu/drm/bridge/ti-sn65dsi83.c index 57a7ed13f996..00d3bfa645f5 100644 --- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c +++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c @@ -732,7 +732,7 @@ static void sn65dsi83_remove(struct i2c_client *client) drm_bridge_remove(&ctx->bridge); } -static struct i2c_device_id sn65dsi83_id[] = { +static const struct i2c_device_id sn65dsi83_id[] = { { "ti,sn65dsi83", MODEL_SN65DSI83 }, { "ti,sn65dsi84", MODEL_SN65DSI84 }, {}, diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c b/drivers/gpu/drm/bridge/ti-sn65dsi86.c index 9e31f750fd88..ce4c026b064f 100644 --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c @@ -1970,7 +1970,7 @@ static int ti_sn65dsi86_probe(struct i2c_client *client) return ti_sn65dsi86_add_aux_device(pdata, &pdata->aux_aux, "aux"); } -static struct i2c_device_id ti_sn65dsi86_id[] = { +static const struct i2c_device_id ti_sn65dsi86_id[] = { { "ti,sn65dsi86", 0}, {}, }; diff --git a/drivers/gpu/drm/clients/Kconfig b/drivers/gpu/drm/clients/Kconfig new file mode 100644 index 000000000000..01ad3b000130 --- /dev/null +++ b/drivers/gpu/drm/clients/Kconfig @@ -0,0 +1,73 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config DRM_CLIENT_LIB + tristate + depends on DRM + select DRM_KMS_HELPER if DRM_FBDEV_EMULATION + select FB_CORE if DRM_FBDEV_EMULATION + help + This option enables the DRM client library and selects all + modules and components according to the enabled clients. + +config DRM_CLIENT_SELECTION + tristate + depends on DRM + select DRM_CLIENT_LIB if DRM_FBDEV_EMULATION + help + Drivers that support in-kernel DRM clients have to select this + option. + +config DRM_CLIENT_SETUP + bool + depends on DRM_CLIENT_SELECTION + help + Enables the DRM client selection. DRM drivers that support the + default clients should select DRM_CLIENT_SELECTION instead. + +menu "Supported DRM clients" + depends on DRM_CLIENT_SELECTION + +config DRM_FBDEV_EMULATION + bool "Enable legacy fbdev support for your modesetting driver" + depends on DRM_CLIENT_SELECTION + select DRM_CLIENT + select DRM_CLIENT_SETUP + select FRAMEBUFFER_CONSOLE_DETECT_PRIMARY if FRAMEBUFFER_CONSOLE + default FB + help + Choose this option if you have a need for the legacy fbdev + support. Note that this support also provides the linux console + support on top of your modesetting driver. + + If in doubt, say "Y". + +config DRM_FBDEV_OVERALLOC + int "Overallocation of the fbdev buffer" + depends on DRM_FBDEV_EMULATION + default 100 + help + Defines the fbdev buffer overallocation in percent. Default + is 100. Typical values for double buffering will be 200, + triple buffering 300. + +config DRM_FBDEV_LEAK_PHYS_SMEM + bool "Shamelessly allow leaking of fbdev physical address (DANGEROUS)" + depends on DRM_FBDEV_EMULATION && EXPERT + default n + help + In order to keep user-space compatibility, we want in certain + use-cases to keep leaking the fbdev physical address to the + user-space program handling the fbdev buffer. + This affects, not only, Amlogic, Allwinner or Rockchip devices + with ARM Mali GPUs using a userspace Blob. + This option is not supported by upstream developers and should be + removed as soon as possible and be considered as a broken and + legacy behaviour from a modern fbdev device driver. + + Please send any bug reports when using this to your proprietary + software vendor that requires this. + + If in doubt, say "N" or spread the word to your closed source + library vendor. + +endmenu diff --git a/drivers/gpu/drm/clients/Makefile b/drivers/gpu/drm/clients/Makefile new file mode 100644 index 000000000000..1d004ec92e1e --- /dev/null +++ b/drivers/gpu/drm/clients/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +drm_client_lib-y := drm_client_setup.o +drm_client_lib-$(CONFIG_DRM_FBDEV_EMULATION) += drm_fbdev_client.o +obj-$(CONFIG_DRM_CLIENT_LIB) += drm_client_lib.o diff --git a/include/drm/drm_fbdev_client.h b/drivers/gpu/drm/clients/drm_client_internal.h index e11a5614f127..23258934956a 100644 --- a/include/drm/drm_fbdev_client.h +++ b/drivers/gpu/drm/clients/drm_client_internal.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: MIT */ -#ifndef DRM_FBDEV_CLIENT_H -#define DRM_FBDEV_CLIENT_H +#ifndef DRM_CLIENT_INTERNAL_H +#define DRM_CLIENT_INTERNAL_H struct drm_device; struct drm_format_info; diff --git a/drivers/gpu/drm/drm_client_setup.c b/drivers/gpu/drm/clients/drm_client_setup.c index c14221ca5a0d..4b211a4812b5 100644 --- a/drivers/gpu/drm/drm_client_setup.c +++ b/drivers/gpu/drm/clients/drm_client_setup.c @@ -1,11 +1,12 @@ // SPDX-License-Identifier: MIT -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_device.h> -#include <drm/drm_fbdev_client.h> #include <drm/drm_fourcc.h> #include <drm/drm_print.h> +#include "drm_client_internal.h" + /** * drm_client_setup() - Setup in-kernel DRM clients * @dev: DRM device diff --git a/drivers/gpu/drm/drm_fbdev_client.c b/drivers/gpu/drm/clients/drm_fbdev_client.c index 246fb63ab250..f894ba52bdb5 100644 --- a/drivers/gpu/drm/drm_fbdev_client.c +++ b/drivers/gpu/drm/clients/drm_fbdev_client.c @@ -3,11 +3,12 @@ #include <drm/drm_client.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_drv.h> -#include <drm/drm_fbdev_client.h> #include <drm/drm_fb_helper.h> #include <drm/drm_fourcc.h> #include <drm/drm_print.h> +#include "drm_client_internal.h" + /* * struct drm_client_funcs */ @@ -164,4 +165,3 @@ err_drm_client_init: kfree(fb_helper); return ret; } -EXPORT_SYMBOL(drm_fbdev_client_setup); diff --git a/drivers/gpu/drm/display/drm_dp_helper.c b/drivers/gpu/drm/display/drm_dp_helper.c index 6ee51003de3c..da3c8521a7fa 100644 --- a/drivers/gpu/drm/display/drm_dp_helper.c +++ b/drivers/gpu/drm/display/drm_dp_helper.c @@ -22,15 +22,16 @@ #include <linux/backlight.h> #include <linux/delay.h> +#include <linux/dynamic_debug.h> #include <linux/errno.h> #include <linux/i2c.h> #include <linux/init.h> +#include <linux/iopoll.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/sched.h> #include <linux/seq_file.h> #include <linux/string_helpers.h> -#include <linux/dynamic_debug.h> #include <drm/display/drm_dp_helper.h> #include <drm/display/drm_dp_mst_helper.h> @@ -779,6 +780,128 @@ int drm_dp_dpcd_read_phy_link_status(struct drm_dp_aux *aux, } EXPORT_SYMBOL(drm_dp_dpcd_read_phy_link_status); +static int read_payload_update_status(struct drm_dp_aux *aux) +{ + int ret; + u8 status; + + ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status); + if (ret < 0) + return ret; + + return status; +} + +/** + * drm_dp_dpcd_write_payload() - Write Virtual Channel information to payload table + * @aux: DisplayPort AUX channel + * @vcpid: Virtual Channel Payload ID + * @start_time_slot: Starting time slot + * @time_slot_count: Time slot count + * + * Write the Virtual Channel payload allocation table, checking the payload + * update status and retrying as necessary. + * + * Returns: + * 0 on success, negative error otherwise + */ +int drm_dp_dpcd_write_payload(struct drm_dp_aux *aux, + int vcpid, u8 start_time_slot, u8 time_slot_count) +{ + u8 payload_alloc[3], status; + int ret; + int retries = 0; + + drm_dp_dpcd_writeb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, + DP_PAYLOAD_TABLE_UPDATED); + + payload_alloc[0] = vcpid; + payload_alloc[1] = start_time_slot; + payload_alloc[2] = time_slot_count; + + ret = drm_dp_dpcd_write(aux, DP_PAYLOAD_ALLOCATE_SET, payload_alloc, 3); + if (ret != 3) { + drm_dbg_kms(aux->drm_dev, "failed to write payload allocation %d\n", ret); + goto fail; + } + +retry: + ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status); + if (ret < 0) { + drm_dbg_kms(aux->drm_dev, "failed to read payload table status %d\n", ret); + goto fail; + } + + if (!(status & DP_PAYLOAD_TABLE_UPDATED)) { + retries++; + if (retries < 20) { + usleep_range(10000, 20000); + goto retry; + } + drm_dbg_kms(aux->drm_dev, "status not set after read payload table status %d\n", + status); + ret = -EINVAL; + goto fail; + } + ret = 0; +fail: + return ret; +} +EXPORT_SYMBOL(drm_dp_dpcd_write_payload); + +/** + * drm_dp_dpcd_clear_payload() - Clear the entire VC Payload ID table + * @aux: DisplayPort AUX channel + * + * Clear the entire VC Payload ID table. + * + * Returns: 0 on success, negative error code on errors. + */ +int drm_dp_dpcd_clear_payload(struct drm_dp_aux *aux) +{ + return drm_dp_dpcd_write_payload(aux, 0, 0, 0x3f); +} +EXPORT_SYMBOL(drm_dp_dpcd_clear_payload); + +/** + * drm_dp_dpcd_poll_act_handled() - Poll for ACT handled status + * @aux: DisplayPort AUX channel + * @timeout_ms: Timeout in ms + * + * Try waiting for the sink to finish updating its payload table by polling for + * the ACT handled bit of DP_PAYLOAD_TABLE_UPDATE_STATUS for up to @timeout_ms + * milliseconds, defaulting to 3000 ms if 0. + * + * Returns: + * 0 if the ACT was handled in time, negative error code on failure. + */ +int drm_dp_dpcd_poll_act_handled(struct drm_dp_aux *aux, int timeout_ms) +{ + int ret, status; + + /* default to 3 seconds, this is arbitrary */ + timeout_ms = timeout_ms ?: 3000; + + ret = readx_poll_timeout(read_payload_update_status, aux, status, + status & DP_PAYLOAD_ACT_HANDLED || status < 0, + 200, timeout_ms * USEC_PER_MSEC); + if (ret < 0 && status >= 0) { + drm_err(aux->drm_dev, "Failed to get ACT after %d ms, last status: %02x\n", + timeout_ms, status); + return -EINVAL; + } else if (status < 0) { + /* + * Failure here isn't unexpected - the hub may have + * just been unplugged + */ + drm_dbg_kms(aux->drm_dev, "Failed to read payload table status: %d\n", status); + return status; + } + + return 0; +} +EXPORT_SYMBOL(drm_dp_dpcd_poll_act_handled); + static bool is_edid_digital_input_dp(const struct drm_edid *drm_edid) { /* FIXME: get rid of drm_edid_raw() */ diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c index dc4446d589e7..687c70308d82 100644 --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c @@ -29,7 +29,6 @@ #include <linux/random.h> #include <linux/sched.h> #include <linux/seq_file.h> -#include <linux/iopoll.h> #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) #include <linux/stacktrace.h> @@ -68,9 +67,6 @@ static bool dump_dp_payload_table(struct drm_dp_mst_topology_mgr *mgr, static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port); -static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr, - int id, u8 start_slot, u8 num_slots); - static int drm_dp_send_dpcd_read(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port, int offset, int size, u8 *bytes); @@ -3267,7 +3263,7 @@ EXPORT_SYMBOL(drm_dp_send_query_stream_enc_status); static int drm_dp_create_payload_at_dfp(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_atomic_payload *payload) { - return drm_dp_dpcd_write_payload(mgr, payload->vcpi, payload->vc_start_slot, + return drm_dp_dpcd_write_payload(mgr->aux, payload->vcpi, payload->vc_start_slot, payload->time_slots); } @@ -3298,7 +3294,7 @@ static void drm_dp_destroy_payload_at_remote_and_dfp(struct drm_dp_mst_topology_ } if (payload->payload_allocation_status == DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) - drm_dp_dpcd_write_payload(mgr, payload->vcpi, payload->vc_start_slot, 0); + drm_dp_dpcd_write_payload(mgr->aux, payload->vcpi, payload->vc_start_slot, 0); } /** @@ -3686,7 +3682,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms goto out_unlock; /* Write reset payload */ - drm_dp_dpcd_write_payload(mgr, 0, 0, 0x3f); + drm_dp_dpcd_clear_payload(mgr->aux); drm_dp_mst_queue_probe_work(mgr); @@ -4747,61 +4743,6 @@ void drm_dp_mst_update_slots(struct drm_dp_mst_topology_state *mst_state, uint8_ } EXPORT_SYMBOL(drm_dp_mst_update_slots); -static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr, - int id, u8 start_slot, u8 num_slots) -{ - u8 payload_alloc[3], status; - int ret; - int retries = 0; - - drm_dp_dpcd_writeb(mgr->aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, - DP_PAYLOAD_TABLE_UPDATED); - - payload_alloc[0] = id; - payload_alloc[1] = start_slot; - payload_alloc[2] = num_slots; - - ret = drm_dp_dpcd_write(mgr->aux, DP_PAYLOAD_ALLOCATE_SET, payload_alloc, 3); - if (ret != 3) { - drm_dbg_kms(mgr->dev, "failed to write payload allocation %d\n", ret); - goto fail; - } - -retry: - ret = drm_dp_dpcd_readb(mgr->aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status); - if (ret < 0) { - drm_dbg_kms(mgr->dev, "failed to read payload table status %d\n", ret); - goto fail; - } - - if (!(status & DP_PAYLOAD_TABLE_UPDATED)) { - retries++; - if (retries < 20) { - usleep_range(10000, 20000); - goto retry; - } - drm_dbg_kms(mgr->dev, "status not set after read payload table status %d\n", - status); - ret = -EINVAL; - goto fail; - } - ret = 0; -fail: - return ret; -} - -static int do_get_act_status(struct drm_dp_aux *aux) -{ - int ret; - u8 status; - - ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status); - if (ret < 0) - return ret; - - return status; -} - /** * drm_dp_check_act_status() - Polls for ACT handled status. * @mgr: manager to use @@ -4819,28 +4760,9 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) * There doesn't seem to be any recommended retry count or timeout in * the MST specification. Since some hubs have been observed to take * over 1 second to update their payload allocations under certain - * conditions, we use a rather large timeout value. + * conditions, we use a rather large timeout value of 3 seconds. */ - const int timeout_ms = 3000; - int ret, status; - - ret = readx_poll_timeout(do_get_act_status, mgr->aux, status, - status & DP_PAYLOAD_ACT_HANDLED || status < 0, - 200, timeout_ms * USEC_PER_MSEC); - if (ret < 0 && status >= 0) { - drm_err(mgr->dev, "Failed to get ACT after %dms, last status: %02x\n", - timeout_ms, status); - return -EINVAL; - } else if (status < 0) { - /* - * Failure here isn't unexpected - the hub may have - * just been unplugged - */ - drm_dbg_kms(mgr->dev, "Failed to read payload table status: %d\n", status); - return status; - } - - return 0; + return drm_dp_dpcd_poll_act_handled(mgr->aux, 3000); } EXPORT_SYMBOL(drm_dp_check_act_status); diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c index 5c2abc9eca9c..5530919e0ba0 100644 --- a/drivers/gpu/drm/drm_of.c +++ b/drivers/gpu/drm/drm_of.c @@ -564,6 +564,8 @@ EXPORT_SYMBOL_GPL(drm_of_get_data_lanes_count_ep); * Gets parent DSI bus for a DSI device controlled through a bus other * than MIPI-DCS (SPI, I2C, etc.) using the Device Tree. * + * This function assumes that the device's port@0 is the DSI input. + * * Returns pointer to mipi_dsi_host if successful, -EINVAL if the * request is unsupported, -EPROBE_DEFER if the DSI host is found but * not available, or -ENODEV otherwise. @@ -576,7 +578,7 @@ struct mipi_dsi_host *drm_of_get_dsi_bus(struct device *dev) /* * Get first endpoint child from device. */ - endpoint = of_graph_get_next_endpoint(dev->of_node, NULL); + endpoint = of_graph_get_endpoint_by_regs(dev->of_node, 0, -1); if (!endpoint) return ERR_PTR(-ENODEV); diff --git a/drivers/gpu/drm/drm_panel_backlight_quirks.c b/drivers/gpu/drm/drm_panel_backlight_quirks.c new file mode 100644 index 000000000000..c477d98ade2b --- /dev/null +++ b/drivers/gpu/drm/drm_panel_backlight_quirks.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/array_size.h> +#include <linux/dmi.h> +#include <linux/mod_devicetable.h> +#include <linux/module.h> +#include <drm/drm_edid.h> +#include <drm/drm_utils.h> + +struct drm_panel_min_backlight_quirk { + struct { + enum dmi_field field; + const char * const value; + } dmi_match; + struct drm_edid_ident ident; + u8 min_brightness; +}; + +static const struct drm_panel_min_backlight_quirk drm_panel_min_backlight_quirks[] = { + /* 13 inch matte panel */ + { + .dmi_match.field = DMI_BOARD_VENDOR, + .dmi_match.value = "Framework", + .ident.panel_id = drm_edid_encode_panel_id('B', 'O', 'E', 0x0bca), + .ident.name = "NE135FBM-N41", + .min_brightness = 0, + }, + /* 13 inch glossy panel */ + { + .dmi_match.field = DMI_BOARD_VENDOR, + .dmi_match.value = "Framework", + .ident.panel_id = drm_edid_encode_panel_id('B', 'O', 'E', 0x095f), + .ident.name = "NE135FBM-N41", + .min_brightness = 0, + }, + /* 13 inch 2.8k panel */ + { + .dmi_match.field = DMI_BOARD_VENDOR, + .dmi_match.value = "Framework", + .ident.panel_id = drm_edid_encode_panel_id('B', 'O', 'E', 0x0cb4), + .ident.name = "NE135A1M-NY1", + .min_brightness = 0, + }, +}; + +static bool drm_panel_min_backlight_quirk_matches(const struct drm_panel_min_backlight_quirk *quirk, + const struct drm_edid *edid) +{ + if (!dmi_match(quirk->dmi_match.field, quirk->dmi_match.value)) + return false; + + if (!drm_edid_match(edid, &quirk->ident)) + return false; + + return true; +} + +/** + * drm_get_panel_min_brightness_quirk - Get minimum supported brightness level for a panel. + * @edid: EDID of the panel to check + * + * This function checks for platform specific (e.g. DMI based) quirks + * providing info on the minimum backlight brightness for systems where this + * cannot be probed correctly from the hard-/firm-ware. + * + * Returns: + * A negative error value or + * an override value in the range [0, 255] representing 0-100% to be scaled to + * the drivers target range. + */ +int drm_get_panel_min_brightness_quirk(const struct drm_edid *edid) +{ + const struct drm_panel_min_backlight_quirk *quirk; + size_t i; + + if (!IS_ENABLED(CONFIG_DMI)) + return -ENODATA; + + if (!edid) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(drm_panel_min_backlight_quirks); i++) { + quirk = &drm_panel_min_backlight_quirks[i]; + + if (drm_panel_min_backlight_quirk_matches(quirk, edid)) + return quirk->min_brightness; + } + + return -ENODATA; +} +EXPORT_SYMBOL(drm_get_panel_min_brightness_quirk); + +MODULE_DESCRIPTION("Quirks for panel backlight overrides"); +MODULE_LICENSE("GPL"); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c index a46f9e4ac09a..00707001d112 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c @@ -503,7 +503,6 @@ static const struct drm_driver etnaviv_drm_driver = { .fops = &fops, .name = "etnaviv", .desc = "etnaviv DRM", - .date = "20151214", .major = 1, .minor = 4, }; diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index 1c44f85c5f54..f313ae7bc3a3 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -13,9 +13,9 @@ #include <linux/pm_runtime.h> #include <linux/uaccess.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_fourcc.h> @@ -35,7 +35,6 @@ #define DRIVER_NAME "exynos" #define DRIVER_DESC "Samsung SoC DRM" -#define DRIVER_DATE "20180330" /* * Interface history: @@ -118,7 +117,6 @@ static const struct drm_driver exynos_drm_driver = { .fops = &exynos_drm_driver_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c index be1ab673e49e..03b076db9381 100644 --- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c +++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c @@ -18,8 +18,8 @@ #include <linux/pm_runtime.h> #include <linux/regmap.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -174,7 +174,6 @@ static const struct drm_driver fsl_dcu_drm_driver = { .fops = &fsl_dcu_drm_fops, .name = "fsl-dcu-drm", .desc = "Freescale DCU DRM", - .date = "20160425", .major = 1, .minor = 1, }; diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c index c419ebbc49ec..85d3557c2eb9 100644 --- a/drivers/gpu/drm/gma500/psb_drv.c +++ b/drivers/gpu/drm/gma500/psb_drv.c @@ -19,8 +19,8 @@ #include <acpi/video.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_ioctl.h> @@ -513,7 +513,6 @@ static const struct drm_driver driver = { .fops = &psb_gem_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL diff --git a/drivers/gpu/drm/gma500/psb_drv.h b/drivers/gpu/drm/gma500/psb_drv.h index de62cbfcdc72..7f77cb2b2751 100644 --- a/drivers/gpu/drm/gma500/psb_drv.h +++ b/drivers/gpu/drm/gma500/psb_drv.h @@ -26,7 +26,6 @@ #define DRIVER_NAME "gma500" #define DRIVER_DESC "DRM driver for the Intel GMA500, GMA600, GMA3600, GMA3650" -#define DRIVER_DATE "20140314" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 diff --git a/drivers/gpu/drm/gud/gud_drv.c b/drivers/gpu/drm/gud/gud_drv.c index 09ccdc1dc1a2..cb405771d6e2 100644 --- a/drivers/gpu/drm/gud/gud_drv.c +++ b/drivers/gpu/drm/gud/gud_drv.c @@ -13,9 +13,9 @@ #include <linux/vmalloc.h> #include <linux/workqueue.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_blend.h> -#include <drm/drm_client_setup.h> #include <drm/drm_damage_helper.h> #include <drm/drm_debugfs.h> #include <drm/drm_drv.h> @@ -381,7 +381,6 @@ static const struct drm_driver gud_drm_driver = { .name = "gud", .desc = "Generic USB Display", - .date = "20200422", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/hisilicon/hibmc/Kconfig b/drivers/gpu/drm/hisilicon/hibmc/Kconfig index 80253d39664a..93b8d32e3be1 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/Kconfig +++ b/drivers/gpu/drm/hisilicon/hibmc/Kconfig @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only config DRM_HISI_HIBMC tristate "DRM Support for Hisilicon Hibmc" - depends on DRM && PCI && (ARM64 || COMPILE_TEST) + depends on DRM && PCI depends on MMU select DRM_CLIENT_SELECTION select DRM_KMS_HELPER diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c index 8c488c98ac97..7f814c32ed34 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c @@ -15,8 +15,8 @@ #include <linux/module.h> #include <linux/pci.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_gem_framebuffer_helper.h> @@ -57,7 +57,6 @@ static const struct drm_driver hibmc_driver = { .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC, .fops = &hibmc_fops, .name = "hibmc", - .date = "20160828", .desc = "hibmc drm driver", .major = 1, .minor = 0, diff --git a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c index 5616c3917c03..2eb49177ac42 100644 --- a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c +++ b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c @@ -929,7 +929,6 @@ static const struct drm_driver ade_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "kirin", .desc = "Hisilicon Kirin620 SoC DRM Driver", - .date = "20150718", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_drv.c b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_drv.c index b3ab944652a6..1e1c87be1204 100644 --- a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_drv.c +++ b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_drv.c @@ -17,8 +17,8 @@ #include <linux/of_graph.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_gem_dma_helper.h> #include <drm/drm_gem_framebuffer_helper.h> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c index e0953777a206..f59abfa7622a 100644 --- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c +++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c @@ -9,8 +9,8 @@ #include <linux/module.h> #include <linux/pci.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_gem_shmem_helper.h> @@ -20,7 +20,6 @@ #define DRIVER_NAME "hyperv_drm" #define DRIVER_DESC "DRM driver for Hyper-V synthetic video device" -#define DRIVER_DATE "2020" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -31,7 +30,6 @@ static struct drm_driver hyperv_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h b/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h index f0e5c196eae4..c685479c9756 100644 --- a/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h @@ -200,6 +200,13 @@ #define XELPDP_SSC_ENABLE_PLLA REG_BIT(1) #define XELPDP_SSC_ENABLE_PLLB REG_BIT(0) +#define TCSS_DISP_MAILBOX_IN_CMD _MMIO(0x161300) +#define TCSS_DISP_MAILBOX_IN_CMD_RUN_BUSY REG_BIT(31) +#define TCSS_DISP_MAILBOX_IN_CMD_CMD_MASK REG_GENMASK(7, 0) +#define TCSS_DISP_MAILBOX_IN_CMD_DATA(val) REG_FIELD_PREP(TCSS_DISP_MAILBOX_IN_CMD_CMD_MASK, val) + +#define TCSS_DISP_MAILBOX_IN_DATA _MMIO(0x161304) + /* C10 Vendor Registers */ #define PHY_C10_VDR_PLL(idx) (0xC00 + (idx)) #define C10_PLL0_FRACEN REG_BIT8(4) diff --git a/drivers/gpu/drm/i915/display/intel_tc.c b/drivers/gpu/drm/i915/display/intel_tc.c index b16c4d2d4077..33b8f2772076 100644 --- a/drivers/gpu/drm/i915/display/intel_tc.c +++ b/drivers/gpu/drm/i915/display/intel_tc.c @@ -1013,21 +1013,52 @@ xelpdp_tc_phy_wait_for_tcss_power(struct intel_tc_port *tc, bool enabled) return true; } +/* + * Gfx driver WA 14020908590 for PTL tcss_rxdetect_clkswb_req/ack + * handshake violation when pwwreq= 0->1 during TC7/10 entry + */ +static void xelpdp_tc_power_request_wa(struct intel_display *display, bool enable) +{ + /* check if mailbox is running busy */ + if (intel_de_wait_for_clear(display, TCSS_DISP_MAILBOX_IN_CMD, + TCSS_DISP_MAILBOX_IN_CMD_RUN_BUSY, 10)) { + drm_dbg_kms(display->drm, + "Timeout waiting for TCSS mailbox run/busy bit to clear\n"); + return; + } + + intel_de_write(display, TCSS_DISP_MAILBOX_IN_DATA, enable ? 1 : 0); + intel_de_write(display, TCSS_DISP_MAILBOX_IN_CMD, + TCSS_DISP_MAILBOX_IN_CMD_RUN_BUSY | + TCSS_DISP_MAILBOX_IN_CMD_DATA(0x1)); + + /* wait to clear mailbox running busy bit before continuing */ + if (intel_de_wait_for_clear(display, TCSS_DISP_MAILBOX_IN_CMD, + TCSS_DISP_MAILBOX_IN_CMD_RUN_BUSY, 10)) { + drm_dbg_kms(display->drm, + "Timeout after writing data to mailbox. Mailbox run/busy bit did not clear\n"); + return; + } +} + static void __xelpdp_tc_phy_enable_tcss_power(struct intel_tc_port *tc, bool enable) { - struct drm_i915_private *i915 = tc_to_i915(tc); + struct intel_display *display = to_intel_display(tc->dig_port); enum port port = tc->dig_port->base.port; - i915_reg_t reg = XELPDP_PORT_BUF_CTL1(i915, port); + i915_reg_t reg = XELPDP_PORT_BUF_CTL1(display, port); u32 val; assert_tc_cold_blocked(tc); - val = intel_de_read(i915, reg); + if (DISPLAY_VER(display) == 30) + xelpdp_tc_power_request_wa(display, enable); + + val = intel_de_read(display, reg); if (enable) val |= XELPDP_TCSS_POWER_REQUEST; else val &= ~XELPDP_TCSS_POWER_REQUEST; - intel_de_write(i915, reg, val); + intel_de_write(display, reg, val); } static bool xelpdp_tc_phy_enable_tcss_power(struct intel_tc_port *tc, bool enable) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 40269e4c1e31..325da0414d94 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -126,9 +126,6 @@ execlists_active(const struct intel_engine_execlists *execlists) return active; } -struct i915_request * -execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists); - static inline u32 intel_read_status_page(const struct intel_engine_cs *engine, int reg) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index ba55c059063d..fe1f85e5dda3 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -343,6 +343,11 @@ struct intel_engine_guc_stats { * @start_gt_clk: GT clock time of last idle to active transition. */ u64 start_gt_clk; + + /** + * @total: The last value of total returned + */ + u64 total; }; union intel_engine_tlb_inv_reg { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 72090f52fb85..4a80ffa1b962 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -405,15 +405,6 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) return active; } -struct i915_request * -execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists) -{ - struct intel_engine_cs *engine = - container_of(execlists, typeof(*engine), execlists); - - return __unwind_incomplete_requests(engine); -} - static void execlists_context_status_change(struct i915_request *rq, unsigned long status) { diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c index dcef8d498919..c207a4fb03bf 100644 --- a/drivers/gpu/drm/i915/gt/selftest_rps.c +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c @@ -1125,6 +1125,7 @@ static u64 measure_power(struct intel_rps *rps, int *freq) static u64 measure_power_at(struct intel_rps *rps, int *freq) { *freq = rps_set_check(rps, *freq); + msleep(100); return measure_power(rps, freq); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9ede6f240d79..12f1ba7ca9c1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1243,6 +1243,21 @@ static void __get_engine_usage_record(struct intel_engine_cs *engine, } while (++i < 6); } +static void __set_engine_usage_record(struct intel_engine_cs *engine, + u32 last_in, u32 id, u32 total) +{ + struct iosys_map rec_map = intel_guc_engine_usage_record_map(engine); + +#define record_write(map_, field_, val_) \ + iosys_map_wr_field(map_, 0, struct guc_engine_usage_record, field_, val_) + + record_write(&rec_map, last_switch_in_stamp, last_in); + record_write(&rec_map, current_context_index, id); + record_write(&rec_map, total_runtime, total); + +#undef record_write +} + static void guc_update_engine_gt_clks(struct intel_engine_cs *engine) { struct intel_engine_guc_stats *stats = &engine->stats.guc; @@ -1363,9 +1378,12 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now) total += intel_gt_clock_interval_to_ns(gt, clk); } + if (total > stats->total) + stats->total = total; + spin_unlock_irqrestore(&guc->timestamp.lock, flags); - return ns_to_ktime(total); + return ns_to_ktime(stats->total); } static void guc_enable_busyness_worker(struct intel_guc *guc) @@ -1431,8 +1449,21 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc) guc_update_pm_timestamp(guc, &unused); for_each_engine(engine, gt, id) { + struct intel_engine_guc_stats *stats = &engine->stats.guc; + guc_update_engine_gt_clks(engine); - engine->stats.guc.prev_total = 0; + + /* + * If resetting a running context, accumulate the active + * time as well since there will be no context switch. + */ + if (stats->running) { + u64 clk = guc->timestamp.gt_stamp - stats->start_gt_clk; + + stats->total_gt_clks += clk; + } + stats->prev_total = 0; + stats->running = 0; } spin_unlock_irqrestore(&guc->timestamp.lock, flags); @@ -1543,6 +1574,9 @@ err_trylock: static int guc_action_enable_usage_stats(struct intel_guc *guc) { + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; u32 offset = intel_guc_engine_usage_offset(guc); u32 action[] = { INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF, @@ -1550,6 +1584,9 @@ static int guc_action_enable_usage_stats(struct intel_guc *guc) 0, }; + for_each_engine(engine, gt, id) + __set_engine_usage_record(engine, 0, 0xffffffff, 0); + return intel_guc_send(guc, action, ARRAY_SIZE(action)); } @@ -1688,6 +1725,10 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) spin_lock_irq(guc_to_gt(guc)->irq_lock); spin_unlock_irq(guc_to_gt(guc)->irq_lock); + /* Flush tasklet */ + tasklet_disable(&guc->ct.receive_tasklet); + tasklet_enable(&guc->ct.receive_tasklet); + guc_flush_submissions(guc); guc_flush_destroyed_contexts(guc); flush_work(&guc->ct.requests.worker); @@ -2005,6 +2046,8 @@ void intel_guc_submission_cancel_requests(struct intel_guc *guc) void intel_guc_submission_reset_finish(struct intel_guc *guc) { + int outstanding; + /* Reset called during driver load or during wedge? */ if (unlikely(!guc_submission_initialized(guc) || !intel_guc_is_fw_running(guc) || @@ -2018,8 +2061,10 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) * see in CI if this happens frequently / a precursor to taking down the * machine. */ - if (atomic_read(&guc->outstanding_submission_g2h)) - guc_err(guc, "Unexpected outstanding GuC to Host in reset finish\n"); + outstanding = atomic_read(&guc->outstanding_submission_g2h); + if (outstanding) + guc_err(guc, "Unexpected outstanding GuC to Host response(s) in reset finish: %d\n", + outstanding); atomic_set(&guc->outstanding_submission_g2h, 0); intel_guc_global_policies_update(guc); diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c index 365329ff8a07..1bafefb726f5 100644 --- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -1785,7 +1785,6 @@ static const struct drm_driver i915_drm_driver = { .fops = &i915_driver_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/i915/i915_driver.h b/drivers/gpu/drm/i915/i915_driver.h index 94a70d8ec5d5..4b67ad9a61cd 100644 --- a/drivers/gpu/drm/i915/i915_driver.h +++ b/drivers/gpu/drm/i915/i915_driver.h @@ -15,7 +15,6 @@ struct drm_printer; #define DRIVER_NAME "i915" #define DRIVER_DESC "Intel Graphics" -#define DRIVER_DATE "20230929" #define DRIVER_TIMESTAMP 1695980603 extern const struct dev_pm_ops i915_pm_ops; diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 71c0daef1996..819ab933bb10 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -841,7 +841,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m, err_printf(m, "Kernel: %s %s\n", init_utsname()->release, init_utsname()->machine); - err_printf(m, "Driver: %s\n", DRIVER_DATE); ts = ktime_to_timespec64(error->time); err_printf(m, "Time: %lld s %ld us\n", (s64)ts.tv_sec, ts.tv_nsec / NSEC_PER_USEC); diff --git a/drivers/gpu/drm/i915/i915_mm.c b/drivers/gpu/drm/i915/i915_mm.c index f5c97a620962..76e2801619f0 100644 --- a/drivers/gpu/drm/i915/i915_mm.c +++ b/drivers/gpu/drm/i915/i915_mm.c @@ -143,8 +143,8 @@ int remap_io_sg(struct vm_area_struct *vma, /* We rely on prevalidation of the io-mapping to skip track_pfn(). */ GEM_BUG_ON((vma->vm_flags & EXPECTED_FLAGS) != EXPECTED_FLAGS); - while (offset >= sg_dma_len(r.sgt.sgp) >> PAGE_SHIFT) { - offset -= sg_dma_len(r.sgt.sgp) >> PAGE_SHIFT; + while (offset >= r.sgt.max >> PAGE_SHIFT) { + offset -= r.sgt.max >> PAGE_SHIFT; r.sgt = __sgt_iter(__sg_next(r.sgt.sgp), use_dma(iobase)); if (!r.sgt.sgp) return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 93fbf53578da..e55db036be1b 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -302,7 +302,7 @@ void i915_pmu_gt_parked(struct intel_gt *gt) { struct i915_pmu *pmu = >->i915->pmu; - if (!pmu->base.event_init) + if (!pmu->registered) return; spin_lock_irq(&pmu->lock); @@ -324,7 +324,7 @@ void i915_pmu_gt_unparked(struct intel_gt *gt) { struct i915_pmu *pmu = >->i915->pmu; - if (!pmu->base.event_init) + if (!pmu->registered) return; spin_lock_irq(&pmu->lock); @@ -626,7 +626,7 @@ static int i915_pmu_event_init(struct perf_event *event) struct drm_i915_private *i915 = pmu_to_i915(pmu); int ret; - if (pmu->closed) + if (!pmu->registered) return -ENODEV; if (event->attr.type != event->pmu->type) @@ -724,7 +724,7 @@ static void i915_pmu_event_read(struct perf_event *event) struct hw_perf_event *hwc = &event->hw; u64 prev, new; - if (pmu->closed) { + if (!pmu->registered) { event->hw.state = PERF_HES_STOPPED; return; } @@ -850,7 +850,7 @@ static void i915_pmu_event_start(struct perf_event *event, int flags) { struct i915_pmu *pmu = event_to_pmu(event); - if (pmu->closed) + if (!pmu->registered) return; i915_pmu_enable(event); @@ -861,7 +861,7 @@ static void i915_pmu_event_stop(struct perf_event *event, int flags) { struct i915_pmu *pmu = event_to_pmu(event); - if (pmu->closed) + if (!pmu->registered) goto out; if (flags & PERF_EF_UPDATE) @@ -877,7 +877,7 @@ static int i915_pmu_event_add(struct perf_event *event, int flags) { struct i915_pmu *pmu = event_to_pmu(event); - if (pmu->closed) + if (!pmu->registered) return -ENODEV; if (flags & PERF_EF_START) @@ -1177,8 +1177,6 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node) { struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); - GEM_BUG_ON(!pmu->base.event_init); - /* Select the first online CPU as a designated reader. */ if (cpumask_empty(&i915_pmu_cpumask)) cpumask_set_cpu(cpu, &i915_pmu_cpumask); @@ -1191,13 +1189,11 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node); unsigned int target = i915_pmu_target_cpu; - GEM_BUG_ON(!pmu->base.event_init); - /* * Unregistering an instance generates a CPU offline event which we must * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask. */ - if (pmu->closed) + if (!pmu->registered) return 0; if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) { @@ -1218,7 +1214,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) return 0; } -static enum cpuhp_state cpuhp_slot = CPUHP_INVALID; +static enum cpuhp_state cpuhp_state = CPUHP_INVALID; int i915_pmu_init(void) { @@ -1232,28 +1228,28 @@ int i915_pmu_init(void) pr_notice("Failed to setup cpuhp state for i915 PMU! (%d)\n", ret); else - cpuhp_slot = ret; + cpuhp_state = ret; return 0; } void i915_pmu_exit(void) { - if (cpuhp_slot != CPUHP_INVALID) - cpuhp_remove_multi_state(cpuhp_slot); + if (cpuhp_state != CPUHP_INVALID) + cpuhp_remove_multi_state(cpuhp_state); } static int i915_pmu_register_cpuhp_state(struct i915_pmu *pmu) { - if (cpuhp_slot == CPUHP_INVALID) + if (cpuhp_state == CPUHP_INVALID) return -EINVAL; - return cpuhp_state_add_instance(cpuhp_slot, &pmu->cpuhp.node); + return cpuhp_state_add_instance(cpuhp_state, &pmu->cpuhp.node); } static void i915_pmu_unregister_cpuhp_state(struct i915_pmu *pmu) { - cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node); + cpuhp_state_remove_instance(cpuhp_state, &pmu->cpuhp.node); } void i915_pmu_register(struct drm_i915_private *i915) @@ -1265,7 +1261,6 @@ void i915_pmu_register(struct drm_i915_private *i915) &i915_pmu_cpumask_attr_group, NULL }; - int ret = -ENOMEM; spin_lock_init(&pmu->lock); @@ -1316,6 +1311,8 @@ void i915_pmu_register(struct drm_i915_private *i915) if (ret) goto err_unreg; + pmu->registered = true; + return; err_unreg: @@ -1323,7 +1320,6 @@ err_unreg: err_groups: kfree(pmu->base.attr_groups); err_attr: - pmu->base.event_init = NULL; free_event_attributes(pmu); err_name: if (IS_DGFX(i915)) @@ -1336,23 +1332,17 @@ void i915_pmu_unregister(struct drm_i915_private *i915) { struct i915_pmu *pmu = &i915->pmu; - if (!pmu->base.event_init) + if (!pmu->registered) return; - /* - * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu - * ensures all currently executing ones will have exited before we - * proceed with unregistration. - */ - pmu->closed = true; - synchronize_rcu(); + /* Disconnect the PMU callbacks */ + pmu->registered = false; hrtimer_cancel(&pmu->timer); i915_pmu_unregister_cpuhp_state(pmu); perf_pmu_unregister(&pmu->base); - pmu->base.event_init = NULL; kfree(pmu->base.attr_groups); if (IS_DGFX(i915)) kfree(pmu->name); diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h index 41af038c3738..8e66d63d0c9f 100644 --- a/drivers/gpu/drm/i915/i915_pmu.h +++ b/drivers/gpu/drm/i915/i915_pmu.h @@ -68,9 +68,9 @@ struct i915_pmu { */ struct pmu base; /** - * @closed: i915 is unregistering. + * @registered: PMU is registered and not in the unregistering process. */ - bool closed; + bool registered; /** * @name: Name as registered with perf core. */ diff --git a/drivers/gpu/drm/imagination/pvr_drv.c b/drivers/gpu/drm/imagination/pvr_drv.c index 85ee9abd1811..0639502137b4 100644 --- a/drivers/gpu/drm/imagination/pvr_drv.c +++ b/drivers/gpu/drm/imagination/pvr_drv.c @@ -1387,7 +1387,6 @@ static struct drm_driver pvr_drm_driver = { .name = PVR_DRIVER_NAME, .desc = PVR_DRIVER_DESC, - .date = PVR_DRIVER_DATE, .major = PVR_DRIVER_MAJOR, .minor = PVR_DRIVER_MINOR, .patchlevel = PVR_DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/imagination/pvr_drv.h b/drivers/gpu/drm/imagination/pvr_drv.h index 378fe477b759..7fa147312dd1 100644 --- a/drivers/gpu/drm/imagination/pvr_drv.h +++ b/drivers/gpu/drm/imagination/pvr_drv.h @@ -9,7 +9,6 @@ #define PVR_DRIVER_NAME "powervr" #define PVR_DRIVER_DESC "Imagination PowerVR (Series 6 and later) & IMG Graphics" -#define PVR_DRIVER_DATE "20230904" /* * Driver interface version: diff --git a/drivers/gpu/drm/imx/dcss/dcss-kms.c b/drivers/gpu/drm/imx/dcss/dcss-kms.c index 63a335c62296..3633e8f3aff6 100644 --- a/drivers/gpu/drm/imx/dcss/dcss-kms.c +++ b/drivers/gpu/drm/imx/dcss/dcss-kms.c @@ -3,11 +3,11 @@ * Copyright 2019 NXP. */ +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> #include <drm/drm_bridge_connector.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -34,7 +34,6 @@ static const struct drm_driver dcss_kms_driver = { .fops = &dcss_cma_fops, .name = "imx-dcss", .desc = "i.MX8MQ Display Subsystem", - .date = "20190917", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c b/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c index 5f2c93c3c288..ec5fd9a01f1e 100644 --- a/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c +++ b/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c @@ -13,9 +13,9 @@ #include <video/imx-ipu-v3.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -163,7 +163,6 @@ static const struct drm_driver imx_drm_driver = { .fops = &imx_drm_driver_fops, .name = "imx-drm", .desc = "i.MX DRM graphics", - .date = "20120507", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/imx/lcdc/imx-lcdc.c b/drivers/gpu/drm/imx/lcdc/imx-lcdc.c index fa7d44623c52..8d6a0bb31c48 100644 --- a/drivers/gpu/drm/imx/lcdc/imx-lcdc.c +++ b/drivers/gpu/drm/imx/lcdc/imx-lcdc.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0-only // SPDX-FileCopyrightText: 2020 Marian Cichy <M.Cichy@pengutronix.de> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_bridge.h> #include <drm/drm_bridge_connector.h> -#include <drm/drm_client_setup.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> @@ -352,7 +352,6 @@ static struct drm_driver imx_lcdc_drm_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "imx-lcdc", .desc = "i.MX LCDC driver", - .date = "20200716", }; static const struct of_device_id imx_lcdc_of_dev_id[] = { diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 8469e1e5e582..c23ee2d214de 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -20,11 +20,11 @@ #include <linux/pm.h> #include <linux/regmap.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> #include <drm/drm_bridge_connector.h> -#include <drm/drm_client_setup.h> #include <drm/drm_color_mgmt.h> #include <drm/drm_crtc.h> #include <drm/drm_damage_helper.h> @@ -953,7 +953,6 @@ static const struct drm_driver ingenic_drm_driver_data = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC, .name = "ingenic-drm", .desc = "DRM module for Ingenic SoCs", - .date = "20200716", .major = 1, .minor = 1, .patchlevel = 0, diff --git a/drivers/gpu/drm/kmb/kmb_drv.c b/drivers/gpu/drm/kmb/kmb_drv.c index a3d31de761cb..32cda134ae3e 100644 --- a/drivers/gpu/drm/kmb/kmb_drv.c +++ b/drivers/gpu/drm/kmb/kmb_drv.c @@ -13,8 +13,8 @@ #include <linux/pm_runtime.h> #include <linux/regmap.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -445,7 +445,6 @@ static const struct drm_driver kmb_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "kmb-drm", .desc = "KEEMBAY DISPLAY DRIVER", - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/kmb/kmb_drv.h b/drivers/gpu/drm/kmb/kmb_drv.h index bf085e95b28f..1f0c10d317fe 100644 --- a/drivers/gpu/drm/kmb/kmb_drv.h +++ b/drivers/gpu/drm/kmb/kmb_drv.h @@ -16,7 +16,6 @@ #define KMB_MIN_WIDTH 1920 /*Max width in pixels */ #define KMB_MIN_HEIGHT 1080 /*Max height in pixels */ -#define DRIVER_DATE "20210223" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 1 diff --git a/drivers/gpu/drm/lima/lima_drv.c b/drivers/gpu/drm/lima/lima_drv.c index fb3062c872b3..2067c5b65c57 100644 --- a/drivers/gpu/drm/lima/lima_drv.c +++ b/drivers/gpu/drm/lima/lima_drv.c @@ -271,7 +271,6 @@ static const struct drm_driver lima_drm_driver = { .fops = &lima_drm_driver_fops, .name = "lima", .desc = "lima DRM", - .date = "20191231", .major = 1, .minor = 1, .patchlevel = 0, diff --git a/drivers/gpu/drm/logicvc/logicvc_drm.c b/drivers/gpu/drm/logicvc/logicvc_drm.c index fb9de5e0bc0e..204b0fee55d0 100644 --- a/drivers/gpu/drm/logicvc/logicvc_drm.c +++ b/drivers/gpu/drm/logicvc/logicvc_drm.c @@ -15,8 +15,8 @@ #include <linux/regmap.h> #include <linux/types.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -52,7 +52,6 @@ static struct drm_driver logicvc_drm_driver = { .fops = &logicvc_drm_fops, .name = "logicvc-drm", .desc = "Xylon LogiCVC DRM driver", - .date = "20200403", .major = 1, .minor = 0, diff --git a/drivers/gpu/drm/loongson/lsdc_drv.c b/drivers/gpu/drm/loongson/lsdc_drv.c index b350bdcf1645..12193d2a301a 100644 --- a/drivers/gpu/drm/loongson/lsdc_drv.c +++ b/drivers/gpu/drm/loongson/lsdc_drv.c @@ -7,9 +7,9 @@ #include <linux/pci.h> #include <linux/vgaarb.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_gem_framebuffer_helper.h> @@ -26,7 +26,6 @@ #define DRIVER_AUTHOR "Sui Jingfeng <suijingfeng@loongson.cn>" #define DRIVER_NAME "loongson" #define DRIVER_DESC "drm driver for loongson graphics" -#define DRIVER_DATE "20220701" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 @@ -39,7 +38,6 @@ static const struct drm_driver lsdc_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, @@ -232,9 +230,9 @@ lsdc_create_device(struct pci_dev *pdev, lsdc_gem_init(ddev); /* Bar 0 of the DC device contains the MMIO register's base address */ - ldev->reg_base = pcim_iomap(pdev, 0, 0); - if (!ldev->reg_base) - return ERR_PTR(-ENODEV); + ldev->reg_base = pcim_iomap_region(pdev, 0, "lsdc"); + if (IS_ERR(ldev->reg_base)) + return ldev->reg_base; spin_lock_init(&ldev->reglock); diff --git a/drivers/gpu/drm/mcde/mcde_drv.c b/drivers/gpu/drm/mcde/mcde_drv.c index c4d51f5f038d..5f2c462bad7e 100644 --- a/drivers/gpu/drm/mcde/mcde_drv.c +++ b/drivers/gpu/drm/mcde/mcde_drv.c @@ -65,9 +65,9 @@ #include <linux/slab.h> #include <linux/delay.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fb_dma_helper.h> #include <drm/drm_fbdev_dma.h> @@ -208,7 +208,6 @@ static const struct drm_driver mcde_drm_driver = { .fops = &drm_fops, .name = "mcde", .desc = DRIVER_DESC, - .date = "20180529", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 0829ceb9967c..49af2b19a85a 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -12,9 +12,9 @@ #include <linux/pm_runtime.h> #include <linux/dma-mapping.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -33,7 +33,6 @@ #define DRIVER_NAME "mediatek" #define DRIVER_DESC "Mediatek SoC DRM" -#define DRIVER_DATE "20150513" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -615,7 +614,6 @@ static const struct drm_driver mtk_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/meson/meson_drv.c b/drivers/gpu/drm/meson/meson_drv.c index 0f5a1a54544e..81d2ee37e773 100644 --- a/drivers/gpu/drm/meson/meson_drv.c +++ b/drivers/gpu/drm/meson/meson_drv.c @@ -16,8 +16,8 @@ #include <linux/platform_device.h> #include <linux/soc/amlogic/meson-canvas.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -105,7 +105,6 @@ static const struct drm_driver meson_driver = { .fops = &fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = "20161109", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.c b/drivers/gpu/drm/mgag200/mgag200_drv.c index 97fd7eb765b4..069fdd2dc8f6 100644 --- a/drivers/gpu/drm/mgag200/mgag200_drv.c +++ b/drivers/gpu/drm/mgag200/mgag200_drv.c @@ -10,8 +10,8 @@ #include <linux/module.h> #include <linux/pci.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_file.h> @@ -97,7 +97,6 @@ static const struct drm_driver mgag200_driver = { .fops = &mgag200_driver_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h b/drivers/gpu/drm/mgag200/mgag200_drv.h index 988967eafbf2..0608fc63e588 100644 --- a/drivers/gpu/drm/mgag200/mgag200_drv.h +++ b/drivers/gpu/drm/mgag200/mgag200_drv.h @@ -25,7 +25,6 @@ #define DRIVER_NAME "mgag200" #define DRIVER_DESC "MGA G200 SE" -#define DRIVER_DATE "20110418" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index ffbcc97b5018..34b971490f10 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -11,7 +11,7 @@ #include <linux/of_address.h> #include <linux/uaccess.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_ioctl.h> @@ -910,7 +910,6 @@ static const struct drm_driver msm_driver = { .fops = &fops, .name = "msm", .desc = "MSM Snapdragon DRM", - .date = "20130625", .major = MSM_VERSION_MAJOR, .minor = MSM_VERSION_MINOR, .patchlevel = MSM_VERSION_PATCHLEVEL, diff --git a/drivers/gpu/drm/mxsfb/lcdif_drv.c b/drivers/gpu/drm/mxsfb/lcdif_drv.c index 51ae0b51b1e8..8ee00f59ca82 100644 --- a/drivers/gpu/drm/mxsfb/lcdif_drv.c +++ b/drivers/gpu/drm/mxsfb/lcdif_drv.c @@ -14,9 +14,9 @@ #include <linux/platform_device.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_encoder.h> #include <drm/drm_fbdev_dma.h> @@ -248,7 +248,6 @@ static const struct drm_driver lcdif_driver = { .fops = &fops, .name = "imx-lcdif", .desc = "i.MX LCDIF Controller DRM", - .date = "20220417", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/mxsfb/mxsfb_drv.c b/drivers/gpu/drm/mxsfb/mxsfb_drv.c index 6b95e4eb3e4e..59020862cf65 100644 --- a/drivers/gpu/drm/mxsfb/mxsfb_drv.c +++ b/drivers/gpu/drm/mxsfb/mxsfb_drv.c @@ -17,9 +17,9 @@ #include <linux/property.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> @@ -336,7 +336,6 @@ static const struct drm_driver mxsfb_driver = { .fops = &fops, .name = "mxsfb-drm", .desc = "MXSFB Controller DRM", - .date = "20160824", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/nouveau/include/nvif/log.h b/drivers/gpu/drm/nouveau/include/nvif/log.h new file mode 100644 index 000000000000..64f6f8fc6141 --- /dev/null +++ b/drivers/gpu/drm/nouveau/include/nvif/log.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: MIT */ +/* SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. */ + +#ifndef __NVIF_LOG_H__ +#define __NVIF_LOG_H__ + +#ifdef CONFIG_DEBUG_FS + +/** + * nvif_log - structure for tracking logging buffers + * @entry: an entry in a list of struct nvif_logs + * @shutdown: pointer to function to call to clean up + * + * Structure used to track logging buffers so that they can be cleaned up + * when the module exits. + * + * The @shutdown function is called when the module exits. It should free all + * backing resources, such as logging buffers. + */ +struct nvif_log { + struct list_head entry; + void (*shutdown)(struct nvif_log *log); +}; + +/** + * nvif_logs - linked list of nvif_log objects + */ +struct nvif_logs { + struct list_head head; +}; + +#define NVIF_LOGS_DECLARE(logs) \ + struct nvif_logs logs = { LIST_HEAD_INIT(logs.head) } + +static inline void nvif_log_shutdown(struct nvif_logs *logs) +{ + if (!list_empty(&logs->head)) { + struct nvif_log *log, *n; + + list_for_each_entry_safe(log, n, &logs->head, entry) { + /* shutdown() should also delete the log entry */ + log->shutdown(log); + } + } +} + +extern struct nvif_logs gsp_logs; + +#endif + +#endif diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h index a2055f2a014a..5c5f4607fcc9 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h @@ -5,10 +5,13 @@ #include <core/falcon.h> #include <core/firmware.h> +#include <linux/debugfs.h> + #define GSP_PAGE_SHIFT 12 #define GSP_PAGE_SIZE BIT(GSP_PAGE_SHIFT) struct nvkm_gsp_mem { + struct device *dev; size_t size; void *data; dma_addr_t addr; @@ -219,6 +222,24 @@ struct nvkm_gsp { /* The size of the registry RPC */ size_t registry_rpc_size; + +#ifdef CONFIG_DEBUG_FS + /* + * Logging buffers in debugfs. The wrapper objects need to remain + * in memory until the dentry is deleted. + */ + struct { + struct dentry *parent; + struct dentry *init; + struct dentry *rm; + struct dentry *intr; + struct dentry *pmu; + } debugfs; + struct debugfs_blob_wrapper blob_init; + struct debugfs_blob_wrapper blob_intr; + struct debugfs_blob_wrapper blob_rm; + struct debugfs_blob_wrapper blob_pmu; +#endif }; static inline bool diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c index e83db051e851..200e65a7cefc 100644 --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c @@ -313,3 +313,19 @@ nouveau_debugfs_fini(struct nouveau_drm *drm) kfree(drm->debugfs); drm->debugfs = NULL; } + +int +nouveau_module_debugfs_init(void) +{ + nouveau_debugfs_root = debugfs_create_dir("nouveau", NULL); + if (IS_ERR(nouveau_debugfs_root)) + return PTR_ERR(nouveau_debugfs_root); + + return 0; +} + +void +nouveau_module_debugfs_fini(void) +{ + debugfs_remove(nouveau_debugfs_root); +} diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.h b/drivers/gpu/drm/nouveau/nouveau_debugfs.h index 77f0323b38ba..b7617b344ee2 100644 --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.h +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.h @@ -21,6 +21,11 @@ nouveau_debugfs(struct drm_device *dev) extern void nouveau_drm_debugfs_init(struct drm_minor *); extern int nouveau_debugfs_init(struct nouveau_drm *); extern void nouveau_debugfs_fini(struct nouveau_drm *); + +extern struct dentry *nouveau_debugfs_root; + +int nouveau_module_debugfs_init(void); +void nouveau_module_debugfs_fini(void); #else static inline void nouveau_drm_debugfs_init(struct drm_minor *minor) @@ -37,6 +42,17 @@ nouveau_debugfs_fini(struct nouveau_drm *drm) { } +static inline int +nouveau_module_debugfs_init(void) +{ + return 0; +} + +static inline void +nouveau_module_debugfs_fini(void) +{ +} + #endif #endif diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 107f63f08bd9..21d2d9ca5e85 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -30,8 +30,9 @@ #include <linux/vga_switcheroo.h> #include <linux/mmu_notifier.h> #include <linux/dynamic_debug.h> +#include <linux/debugfs.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_gem_ttm_helper.h> @@ -47,6 +48,7 @@ #include <nvif/fifo.h> #include <nvif/push006c.h> #include <nvif/user.h> +#include <nvif/log.h> #include <nvif/class.h> #include <nvif/cl0002.h> @@ -113,6 +115,20 @@ static struct drm_driver driver_stub; static struct drm_driver driver_pci; static struct drm_driver driver_platform; +#ifdef CONFIG_DEBUG_FS +struct dentry *nouveau_debugfs_root; + +/** + * gsp_logs - list of nvif_log GSP-RM logging buffers + * + * Head pointer to a a list of nvif_log buffers that is created for each GPU + * upon GSP shutdown if the "keep_gsp_logging" command-line parameter is + * specified. This is used to track the alternative debugfs entries for the + * GSP-RM logs. + */ +NVIF_LOGS_DECLARE(gsp_logs); +#endif + static u64 nouveau_pci_name(struct pci_dev *pdev) { @@ -1326,11 +1342,6 @@ driver_stub = { .name = DRIVER_NAME, .desc = DRIVER_DESC, -#ifdef GIT_REVISION - .date = GIT_REVISION, -#else - .date = DRIVER_DATE, -#endif .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, @@ -1423,6 +1434,8 @@ err_free: static int __init nouveau_drm_init(void) { + int ret; + driver_pci = driver_stub; driver_platform = driver_stub; @@ -1436,6 +1449,10 @@ nouveau_drm_init(void) if (!nouveau_modeset) return 0; + ret = nouveau_module_debugfs_init(); + if (ret) + return ret; + #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER platform_driver_register(&nouveau_platform_driver); #endif @@ -1444,10 +1461,14 @@ nouveau_drm_init(void) nouveau_backlight_ctor(); #ifdef CONFIG_PCI - return pci_register_driver(&nouveau_drm_pci_driver); -#else - return 0; + ret = pci_register_driver(&nouveau_drm_pci_driver); + if (ret) { + nouveau_module_debugfs_fini(); + return ret; + } #endif + + return 0; } static void __exit @@ -1467,6 +1488,12 @@ nouveau_drm_exit(void) #endif if (IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM)) mmu_notifier_synchronize(); + +#ifdef CONFIG_DEBUG_FS + nvif_log_shutdown(&gsp_logs); +#endif + + nouveau_module_debugfs_fini(); } module_init(nouveau_drm_init); diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 685d6ca3d8aa..55abc510067b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -7,7 +7,6 @@ #define DRIVER_NAME "nouveau" #define DRIVER_DESC "nVidia Riva/TNT/GeForce/Quadro/Tesla/Tegra K1+" -#define DRIVER_DATE "20120801" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 4 diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index d586aea30898..58502102926b 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -26,6 +26,7 @@ #include <subdev/vfn.h> #include <engine/fifo/chan.h> #include <engine/sec2.h> +#include <nvif/log.h> #include <nvfw/fw.h> @@ -57,6 +58,8 @@ #include <linux/ctype.h> #include <linux/parser.h> +extern struct dentry *nouveau_debugfs_root; + #define GSP_MSG_MIN_SIZE GSP_PAGE_SIZE #define GSP_MSG_MAX_SIZE GSP_PAGE_MIN_SIZE * 16 @@ -121,6 +124,8 @@ r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime) return mqe->data; } + size = ALIGN(repc + GSP_MSG_HDR_SIZE, GSP_PAGE_SIZE); + msg = kvmalloc(repc, GFP_KERNEL); if (!msg) return ERR_PTR(-ENOMEM); @@ -129,19 +134,15 @@ r535_gsp_msgq_wait(struct nvkm_gsp *gsp, u32 repc, u32 *prepc, int *ptime) len = min_t(u32, repc, len); memcpy(msg, mqe->data, len); - rptr += DIV_ROUND_UP(len, GSP_PAGE_SIZE); - if (rptr == gsp->msgq.cnt) - rptr = 0; - repc -= len; if (repc) { mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000); memcpy(msg + len, mqe, repc); - - rptr += DIV_ROUND_UP(repc, GSP_PAGE_SIZE); } + rptr = (rptr + DIV_ROUND_UP(size, GSP_PAGE_SIZE)) % gsp->msgq.cnt; + mb(); (*gsp->msgq.rptr) = rptr; return msg; @@ -163,7 +164,7 @@ r535_gsp_cmdq_push(struct nvkm_gsp *gsp, void *argv) u64 *end; u64 csum = 0; int free, time = 1000000; - u32 wptr, size; + u32 wptr, size, step; u32 off = 0; argc = ALIGN(GSP_MSG_HDR_SIZE + argc, GSP_PAGE_SIZE); @@ -197,7 +198,9 @@ r535_gsp_cmdq_push(struct nvkm_gsp *gsp, void *argv) } cqe = (void *)((u8 *)gsp->shm.cmdq.ptr + 0x1000 + wptr * 0x1000); - size = min_t(u32, argc, (gsp->cmdq.cnt - wptr) * GSP_PAGE_SIZE); + step = min_t(u32, free, (gsp->cmdq.cnt - wptr)); + size = min_t(u32, argc, step * GSP_PAGE_SIZE); + memcpy(cqe, (u8 *)cmd + off, size); wptr += DIV_ROUND_UP(size, 0x1000); @@ -1000,7 +1003,7 @@ r535_gsp_rpc_get_gsp_static_info(struct nvkm_gsp *gsp) } static void -nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem) +nvkm_gsp_mem_dtor(struct nvkm_gsp_mem *mem) { if (mem->data) { /* @@ -1009,19 +1012,35 @@ nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem) */ memset(mem->data, 0xFF, mem->size); - dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr); + dma_free_coherent(mem->dev, mem->size, mem->data, mem->addr); + put_device(mem->dev); + memset(mem, 0, sizeof(*mem)); } } +/** + * nvkm_gsp_mem_ctor - constructor for nvkm_gsp_mem objects + * @gsp: gsp pointer + * @size: number of bytes to allocate + * @mem: nvkm_gsp_mem object to initialize + * + * Allocates a block of memory for use with GSP. + * + * This memory block can potentially out-live the driver's remove() callback, + * so we take a device reference to ensure its lifetime. The reference is + * dropped in the destructor. + */ static int nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, size_t size, struct nvkm_gsp_mem *mem) { - mem->size = size; mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL); if (WARN_ON(!mem->data)) return -ENOMEM; + mem->size = size; + mem->dev = get_device(gsp->subdev.device->dev); + return 0; } @@ -1054,8 +1073,8 @@ r535_gsp_postinit(struct nvkm_gsp *gsp) nvkm_wr32(device, 0x110004, 0x00000040); /* Release the DMA buffers that were needed only for boot and init */ - nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw); - nvkm_gsp_mem_dtor(gsp, &gsp->libos); + nvkm_gsp_mem_dtor(&gsp->boot.fw); + nvkm_gsp_mem_dtor(&gsp->libos); return ret; } @@ -2060,6 +2079,215 @@ r535_gsp_rmargs_init(struct nvkm_gsp *gsp, bool resume) return 0; } +#ifdef CONFIG_DEBUG_FS + +/* + * If GSP-RM load fails, then the GSP nvkm object will be deleted, the logging + * debugfs entries will be deleted, and it will not be possible to debug the + * load failure. The keep_gsp_logging parameter tells Nouveau to copy the + * logging buffers to new debugfs entries, and these entries are retained + * until the driver unloads. + */ +static bool keep_gsp_logging; +module_param(keep_gsp_logging, bool, 0444); +MODULE_PARM_DESC(keep_gsp_logging, + "Migrate the GSP-RM logging debugfs entries upon exit"); + +/* + * GSP-RM uses a pseudo-class mechanism to define of a variety of per-"engine" + * data structures, and each engine has a "class ID" genererated by a + * pre-processor. This is the class ID for the PMU. + */ +#define NV_GSP_MSG_EVENT_UCODE_LIBOS_CLASS_PMU 0xf3d722 + +/** + * rpc_ucode_libos_print_v1E_08 - RPC payload for libos print buffers + * @ucode_eng_desc: the engine descriptor + * @libos_print_buf_size: the size of the libos_print_buf[] + * @libos_print_buf: the actual buffer + * + * The engine descriptor is divided into 31:8 "class ID" and 7:0 "instance + * ID". We only care about messages from PMU. + */ +struct rpc_ucode_libos_print_v1e_08 { + u32 ucode_eng_desc; + u32 libos_print_buf_size; + u8 libos_print_buf[]; +}; + +/** + * r535_gsp_msg_libos_print - capture log message from the PMU + * @priv: gsp pointer + * @fn: function number (ignored) + * @repv: pointer to libos print RPC + * @repc: message size + * + * Called when we receive a UCODE_LIBOS_PRINT event RPC from GSP-RM. This RPC + * contains the contents of the libos print buffer from PMU. It is typically + * only written to when PMU encounters an error. + * + * Technically this RPC can be used to pass print buffers from any number of + * GSP-RM engines, but we only expect to receive them for the PMU. + * + * For the PMU, the buffer is 4K in size and the RPC always contains the full + * contents. + */ +static int +r535_gsp_msg_libos_print(void *priv, u32 fn, void *repv, u32 repc) +{ + struct nvkm_gsp *gsp = priv; + struct nvkm_subdev *subdev = &gsp->subdev; + struct rpc_ucode_libos_print_v1e_08 *rpc = repv; + unsigned int class = rpc->ucode_eng_desc >> 8; + + nvkm_debug(subdev, "received libos print from class 0x%x for %u bytes\n", + class, rpc->libos_print_buf_size); + + if (class != NV_GSP_MSG_EVENT_UCODE_LIBOS_CLASS_PMU) { + nvkm_warn(subdev, + "received libos print from unknown class 0x%x\n", + class); + return -ENOMSG; + } + + if (rpc->libos_print_buf_size > GSP_PAGE_SIZE) { + nvkm_error(subdev, "libos print is too large (%u bytes)\n", + rpc->libos_print_buf_size); + return -E2BIG; + } + + memcpy(gsp->blob_pmu.data, rpc->libos_print_buf, rpc->libos_print_buf_size); + + return 0; +} + +/** + * create_debufgs - create a blob debugfs entry + * @gsp: gsp pointer + * @name: name of this dentry + * @blob: blob wrapper + * + * Creates a debugfs entry for a logging buffer with the name 'name'. + */ +static struct dentry *create_debugfs(struct nvkm_gsp *gsp, const char *name, + struct debugfs_blob_wrapper *blob) +{ + struct dentry *dent; + + dent = debugfs_create_blob(name, 0444, gsp->debugfs.parent, blob); + if (IS_ERR(dent)) { + nvkm_error(&gsp->subdev, + "failed to create %s debugfs entry\n", name); + return NULL; + } + + /* + * For some reason, debugfs_create_blob doesn't set the size of the + * dentry, so do that here. See [1] + * + * [1] https://lore.kernel.org/r/linux-fsdevel/20240207200619.3354549-1-ttabi@nvidia.com/ + */ + i_size_write(d_inode(dent), blob->size); + + return dent; +} + +/** + * r535_gsp_libos_debugfs_init - create logging debugfs entries + * @gsp: gsp pointer + * + * Create the debugfs entries. This exposes the log buffers to userspace so + * that an external tool can parse it. + * + * The 'logpmu' contains exception dumps from the PMU. It is written via an + * RPC sent from GSP-RM and must be only 4KB. We create it here because it's + * only useful if there is a debugfs entry to expose it. If we get the PMU + * logging RPC and there is no debugfs entry, the RPC is just ignored. + * + * The blob_init, blob_rm, and blob_pmu objects can't be transient + * because debugfs_create_blob doesn't copy them. + * + * NOTE: OpenRM loads the logging elf image and prints the log messages + * in real-time. We may add that capability in the future, but that + * requires loading ELF images that are not distributed with the driver and + * adding the parsing code to Nouveau. + * + * Ideally, this should be part of nouveau_debugfs_init(), but that function + * is called too late. We really want to create these debugfs entries before + * r535_gsp_booter_load() is called, so that if GSP-RM fails to initialize, + * there could still be a log to capture. + */ +static void +r535_gsp_libos_debugfs_init(struct nvkm_gsp *gsp) +{ + struct device *dev = gsp->subdev.device->dev; + + /* Create a new debugfs directory with a name unique to this GPU. */ + gsp->debugfs.parent = debugfs_create_dir(dev_name(dev), nouveau_debugfs_root); + if (IS_ERR(gsp->debugfs.parent)) { + nvkm_error(&gsp->subdev, + "failed to create %s debugfs root\n", dev_name(dev)); + return; + } + + gsp->blob_init.data = gsp->loginit.data; + gsp->blob_init.size = gsp->loginit.size; + gsp->blob_intr.data = gsp->logintr.data; + gsp->blob_intr.size = gsp->logintr.size; + gsp->blob_rm.data = gsp->logrm.data; + gsp->blob_rm.size = gsp->logrm.size; + + gsp->debugfs.init = create_debugfs(gsp, "loginit", &gsp->blob_init); + if (!gsp->debugfs.init) + goto error; + + gsp->debugfs.intr = create_debugfs(gsp, "logintr", &gsp->blob_intr); + if (!gsp->debugfs.intr) + goto error; + + gsp->debugfs.rm = create_debugfs(gsp, "logrm", &gsp->blob_rm); + if (!gsp->debugfs.rm) + goto error; + + /* + * Since the PMU buffer is copied from an RPC, it doesn't need to be + * a DMA buffer. + */ + gsp->blob_pmu.size = GSP_PAGE_SIZE; + gsp->blob_pmu.data = kzalloc(gsp->blob_pmu.size, GFP_KERNEL); + if (!gsp->blob_pmu.data) + goto error; + + gsp->debugfs.pmu = create_debugfs(gsp, "logpmu", &gsp->blob_pmu); + if (!gsp->debugfs.pmu) { + kfree(gsp->blob_pmu.data); + goto error; + } + + i_size_write(d_inode(gsp->debugfs.init), gsp->blob_init.size); + i_size_write(d_inode(gsp->debugfs.intr), gsp->blob_intr.size); + i_size_write(d_inode(gsp->debugfs.rm), gsp->blob_rm.size); + i_size_write(d_inode(gsp->debugfs.pmu), gsp->blob_pmu.size); + + r535_gsp_msg_ntfy_add(gsp, NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, + r535_gsp_msg_libos_print, gsp); + + nvkm_debug(&gsp->subdev, "created debugfs GSP-RM logging entries\n"); + + if (keep_gsp_logging) { + nvkm_info(&gsp->subdev, + "logging buffers will be retained on failure\n"); + } + + return; + +error: + debugfs_remove(gsp->debugfs.parent); + gsp->debugfs.parent = NULL; +} + +#endif + static inline u64 r535_gsp_libos_id8(const char *name) { @@ -2110,7 +2338,11 @@ static void create_pte_array(u64 *ptes, dma_addr_t addr, size_t size) * written to directly by GSP-RM and can be any multiple of GSP_PAGE_SIZE. * * The physical address map for the log buffer is stored in the buffer - * itself, starting with offset 1. Offset 0 contains the "put" pointer. + * itself, starting with offset 1. Offset 0 contains the "put" pointer (pp). + * Initially, pp is equal to 0. If the buffer has valid logging data in it, + * then pp points to index into the buffer where the next logging entry will + * be written. Therefore, the logging data is valid if: + * 1 <= pp < sizeof(buffer)/sizeof(u64) * * The GSP only understands 4K pages (GSP_PAGE_SIZE), so even if the kernel is * configured for a larger page size (e.g. 64K pages), we need to give @@ -2181,6 +2413,11 @@ r535_gsp_libos_init(struct nvkm_gsp *gsp) args[3].size = gsp->rmargs.size; args[3].kind = LIBOS_MEMORY_REGION_CONTIGUOUS; args[3].loc = LIBOS_MEMORY_REGION_LOC_SYSMEM; + +#ifdef CONFIG_DEBUG_FS + r535_gsp_libos_debugfs_init(gsp); +#endif + return 0; } @@ -2234,8 +2471,8 @@ static void nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3) { nvkm_gsp_sg_free(gsp->subdev.device, &rx3->lvl2); - nvkm_gsp_mem_dtor(gsp, &rx3->lvl1); - nvkm_gsp_mem_dtor(gsp, &rx3->lvl0); + nvkm_gsp_mem_dtor(&rx3->lvl1); + nvkm_gsp_mem_dtor(&rx3->lvl0); } /** @@ -2323,9 +2560,9 @@ nvkm_gsp_radix3_sg(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, if (ret) { lvl2_fail: - nvkm_gsp_mem_dtor(gsp, &rx3->lvl1); + nvkm_gsp_mem_dtor(&rx3->lvl1); lvl1_fail: - nvkm_gsp_mem_dtor(gsp, &rx3->lvl0); + nvkm_gsp_mem_dtor(&rx3->lvl0); } return ret; @@ -2417,7 +2654,7 @@ r535_gsp_init(struct nvkm_gsp *gsp) done: if (gsp->sr.meta.data) { - nvkm_gsp_mem_dtor(gsp, &gsp->sr.meta); + nvkm_gsp_mem_dtor(&gsp->sr.meta); nvkm_gsp_radix3_dtor(gsp, &gsp->sr.radix3); nvkm_gsp_sg_free(gsp->subdev.device, &gsp->sr.sgt); return ret; @@ -2491,6 +2728,222 @@ r535_gsp_dtor_fws(struct nvkm_gsp *gsp) gsp->fws.rm = NULL; } +#ifdef CONFIG_DEBUG_FS + +struct r535_gsp_log { + struct nvif_log log; + + /* + * Logging buffers in debugfs. The wrapper objects need to remain + * in memory until the dentry is deleted. + */ + struct dentry *debugfs_logging_dir; + struct debugfs_blob_wrapper blob_init; + struct debugfs_blob_wrapper blob_intr; + struct debugfs_blob_wrapper blob_rm; + struct debugfs_blob_wrapper blob_pmu; +}; + +/** + * r535_debugfs_shutdown - delete GSP-RM logging buffers for one GPU + * @_log: nvif_log struct for this GPU + * + * Called when the driver is shutting down, to clean up the retained GSP-RM + * logging buffers. + */ +static void r535_debugfs_shutdown(struct nvif_log *_log) +{ + struct r535_gsp_log *log = container_of(_log, struct r535_gsp_log, log); + + debugfs_remove(log->debugfs_logging_dir); + + kfree(log->blob_init.data); + kfree(log->blob_intr.data); + kfree(log->blob_rm.data); + kfree(log->blob_pmu.data); + + /* We also need to delete the list object */ + kfree(log); +} + +/** + * is_empty - return true if the logging buffer was never written to + * @b: blob wrapper with ->data field pointing to logging buffer + * + * The first 64-bit field of loginit, and logintr, and logrm is the 'put' + * pointer, and it is initialized to 0. It's a dword-based index into the + * circular buffer, indicating where the next printf write will be made. + * + * If the pointer is still 0 when GSP-RM is shut down, that means that the + * buffer was never written to, so it can be ignored. + * + * This test also works for logpmu, even though it doesn't have a put pointer. + */ +static bool is_empty(const struct debugfs_blob_wrapper *b) +{ + u64 *put = b->data; + + return put ? (*put == 0) : true; +} + +/** + * r535_gsp_copy_log - preserve the logging buffers in a blob + * + * When GSP shuts down, the nvkm_gsp object and all its memory is deleted. + * To preserve the logging buffers, the buffers need to be copied, but only + * if they actually have data. + */ +static int r535_gsp_copy_log(struct dentry *parent, + const char *name, + const struct debugfs_blob_wrapper *s, + struct debugfs_blob_wrapper *t) +{ + struct dentry *dent; + void *p; + + if (is_empty(s)) + return 0; + + /* The original buffers will be deleted */ + p = kmemdup(s->data, s->size, GFP_KERNEL); + if (!p) + return -ENOMEM; + + t->data = p; + t->size = s->size; + + dent = debugfs_create_blob(name, 0444, parent, t); + if (IS_ERR(dent)) { + kfree(p); + memset(t, 0, sizeof(*t)); + return PTR_ERR(dent); + } + + i_size_write(d_inode(dent), t->size); + + return 0; +} + +/** + * r535_gsp_retain_logging - copy logging buffers to new debugfs root + * @gsp: gsp pointer + * + * If keep_gsp_logging is enabled, then we want to preserve the GSP-RM logging + * buffers and their debugfs entries, but all those objects would normally + * deleted if GSP-RM fails to load. + * + * To preserve the logging buffers, we need to: + * + * 1) Allocate new buffers and copy the logs into them, so that the original + * DMA buffers can be released. + * + * 2) Preserve the directories. We don't need to save single dentries because + * we're going to delete the parent when the + * + * If anything fails in this process, then all the dentries need to be + * deleted. We don't need to deallocate the original logging buffers because + * the caller will do that regardless. + */ +static void r535_gsp_retain_logging(struct nvkm_gsp *gsp) +{ + struct device *dev = gsp->subdev.device->dev; + struct r535_gsp_log *log = NULL; + int ret; + + if (!keep_gsp_logging || !gsp->debugfs.parent) { + /* Nothing to do */ + goto exit; + } + + /* Check to make sure at least one buffer has data. */ + if (is_empty(&gsp->blob_init) && is_empty(&gsp->blob_intr) && + is_empty(&gsp->blob_rm) && is_empty(&gsp->blob_rm)) { + nvkm_warn(&gsp->subdev, "all logging buffers are empty\n"); + goto exit; + } + + log = kzalloc(sizeof(*log), GFP_KERNEL); + if (!log) + goto error; + + /* + * Since the nvkm_gsp object is going away, the debugfs_blob_wrapper + * objects are also being deleted, which means the dentries will no + * longer be valid. Delete the existing entries so that we can create + * new ones with the same name. + */ + debugfs_remove(gsp->debugfs.init); + debugfs_remove(gsp->debugfs.intr); + debugfs_remove(gsp->debugfs.rm); + debugfs_remove(gsp->debugfs.pmu); + + ret = r535_gsp_copy_log(gsp->debugfs.parent, "loginit", &gsp->blob_init, &log->blob_init); + if (ret) + goto error; + + ret = r535_gsp_copy_log(gsp->debugfs.parent, "logintr", &gsp->blob_intr, &log->blob_intr); + if (ret) + goto error; + + ret = r535_gsp_copy_log(gsp->debugfs.parent, "logrm", &gsp->blob_rm, &log->blob_rm); + if (ret) + goto error; + + ret = r535_gsp_copy_log(gsp->debugfs.parent, "logpmu", &gsp->blob_pmu, &log->blob_pmu); + if (ret) + goto error; + + /* The nvkm_gsp object is going away, so save the dentry */ + log->debugfs_logging_dir = gsp->debugfs.parent; + + log->log.shutdown = r535_debugfs_shutdown; + list_add(&log->log.entry, &gsp_logs.head); + + nvkm_warn(&gsp->subdev, + "logging buffers migrated to /sys/kernel/debug/nouveau/%s\n", + dev_name(dev)); + + return; + +error: + nvkm_warn(&gsp->subdev, "failed to migrate logging buffers\n"); + +exit: + debugfs_remove(gsp->debugfs.parent); + + if (log) { + kfree(log->blob_init.data); + kfree(log->blob_intr.data); + kfree(log->blob_rm.data); + kfree(log->blob_pmu.data); + kfree(log); + } +} + +#endif + +/** + * r535_gsp_libos_debugfs_fini - cleanup/retain log buffers on shutdown + * @gsp: gsp pointer + * + * If the log buffers are exposed via debugfs, the data for those entries + * needs to be cleaned up when the GSP device shuts down. + */ +static void +r535_gsp_libos_debugfs_fini(struct nvkm_gsp __maybe_unused *gsp) +{ +#ifdef CONFIG_DEBUG_FS + r535_gsp_retain_logging(gsp); + + /* + * Unlike the other buffers, the PMU blob is a kmalloc'd buffer that + * exists only if the debugfs entries were created. + */ + kfree(gsp->blob_pmu.data); + gsp->blob_pmu.data = NULL; +#endif +} + void r535_gsp_dtor(struct nvkm_gsp *gsp) { @@ -2498,7 +2951,7 @@ r535_gsp_dtor(struct nvkm_gsp *gsp) mutex_destroy(&gsp->client_id.mutex); nvkm_gsp_radix3_dtor(gsp, &gsp->radix3); - nvkm_gsp_mem_dtor(gsp, &gsp->sig); + nvkm_gsp_mem_dtor(&gsp->sig); nvkm_firmware_dtor(&gsp->fw); nvkm_falcon_fw_dtor(&gsp->booter.unload); @@ -2509,12 +2962,15 @@ r535_gsp_dtor(struct nvkm_gsp *gsp) r535_gsp_dtor_fws(gsp); - nvkm_gsp_mem_dtor(gsp, &gsp->rmargs); - nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta); - nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem); - nvkm_gsp_mem_dtor(gsp, &gsp->loginit); - nvkm_gsp_mem_dtor(gsp, &gsp->logintr); - nvkm_gsp_mem_dtor(gsp, &gsp->logrm); + nvkm_gsp_mem_dtor(&gsp->rmargs); + nvkm_gsp_mem_dtor(&gsp->wpr_meta); + nvkm_gsp_mem_dtor(&gsp->shm.mem); + + r535_gsp_libos_debugfs_fini(gsp); + + nvkm_gsp_mem_dtor(&gsp->loginit); + nvkm_gsp_mem_dtor(&gsp->logintr); + nvkm_gsp_mem_dtor(&gsp->logrm); } int diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c index e27376121606..054b71dba6a7 100644 --- a/drivers/gpu/drm/omapdrm/omap_drv.c +++ b/drivers/gpu/drm/omapdrm/omap_drv.c @@ -28,7 +28,6 @@ #define DRIVER_NAME MODULE_NAME #define DRIVER_DESC "OMAP DRM" -#define DRIVER_DATE "20110917" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 @@ -653,7 +652,6 @@ static const struct drm_driver omap_drm_driver = { .fops = &omapdriver_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/omapdrm/omap_fbdev.c b/drivers/gpu/drm/omapdrm/omap_fbdev.c index f4bd0c6e3f34..7b6396890681 100644 --- a/drivers/gpu/drm/omapdrm/omap_fbdev.c +++ b/drivers/gpu/drm/omapdrm/omap_fbdev.c @@ -6,7 +6,7 @@ #include <linux/fb.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_fb_helper.h> diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c index 94a46241dece..f8511fe5fb0d 100644 --- a/drivers/gpu/drm/panel/panel-edp.c +++ b/drivers/gpu/drm/panel/panel-edp.c @@ -1802,6 +1802,12 @@ static const struct panel_delay delay_200_500_e50_po2e200 = { .powered_on_to_enable = 200, }; +static const struct panel_delay delay_200_150_e50 = { + .hpd_absent = 200, + .unprepare = 150, + .enable = 50, +}; + #define EDP_PANEL_ENTRY(vend_chr_0, vend_chr_1, vend_chr_2, product_id, _delay, _name) \ { \ .ident = { \ @@ -1913,6 +1919,7 @@ static const struct edp_panel_entry edp_panels[] = { EDP_PANEL_ENTRY('B', 'O', 'E', 0x0b56, &delay_200_500_e80, "NT140FHM-N47"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x0b66, &delay_200_500_e80, "NE140WUM-N6G"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x0c20, &delay_200_500_e80, "NT140FHM-N47"), + EDP_PANEL_ENTRY('B', 'O', 'E', 0x0c93, &delay_200_500_e200, "Unknown"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x0cb6, &delay_200_500_e200, "NT116WHM-N44"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x0cfa, &delay_200_500_e50, "NV116WHM-A4D"), @@ -1963,6 +1970,7 @@ static const struct edp_panel_entry edp_panels[] = { EDP_PANEL_ENTRY('K', 'D', 'B', 0x1118, &delay_200_500_e50, "KD116N29-30NK-A005"), EDP_PANEL_ENTRY('K', 'D', 'B', 0x1120, &delay_200_500_e80_d50, "116N29-30NK-C007"), EDP_PANEL_ENTRY('K', 'D', 'B', 0x1212, &delay_200_500_e50, "KD116N0930A16"), + EDP_PANEL_ENTRY('K', 'D', 'B', 0x1707, &delay_200_150_e50, "KD116N2130B12"), EDP_PANEL_ENTRY('K', 'D', 'C', 0x044f, &delay_200_500_e50, "KD116N9-30NH-F3"), EDP_PANEL_ENTRY('K', 'D', 'C', 0x05f1, &delay_200_500_e80_d50, "KD116N5-30NV-G7"), diff --git a/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c b/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c index 4618c892cdd6..e10e469aa7a6 100644 --- a/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c +++ b/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c @@ -400,7 +400,7 @@ static int rpi_touchscreen_probe(struct i2c_client *i2c) rpi_touchscreen_i2c_write(ts, REG_POWERON, 0); /* Look up the DSI host. It needs to probe before we do. */ - endpoint = of_graph_get_next_endpoint(dev->of_node, NULL); + endpoint = of_graph_get_endpoint_by_regs(dev->of_node, 0, -1); if (!endpoint) return -ENODEV; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ee3864476eb9..0f3935556ac7 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -636,7 +636,6 @@ static const struct drm_driver panfrost_drm_driver = { .fops = &panfrost_drm_driver_fops, .name = "panfrost", .desc = "panfrost DRM", - .date = "20180908", .major = 1, .minor = 3, diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index f5abde3866fb..174e190ba40f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -236,6 +236,10 @@ static const struct panfrost_model gpu_models[] = { */ GPU_MODEL(g57, 0x9003, GPU_REV(g57, 0, 0)), + + /* MediaTek MT8188 Mali-G57 MC3 */ + GPU_MODEL(g57, 0x9093, + GPU_REV(g57, 0, 0)), }; static void panfrost_gpu_init_features(struct panfrost_device *pfdev) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index 6fbff516c1c1..00f7b8ce935a 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -22,6 +22,24 @@ #include "panthor_regs.h" #include "panthor_sched.h" +static int panthor_gpu_coherency_init(struct panthor_device *ptdev) +{ + ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT; + + if (!ptdev->coherent) + return 0; + + /* Check if the ACE-Lite coherency protocol is actually supported by the GPU. + * ACE protocol has never been supported for command stream frontend GPUs. + */ + if ((gpu_read(ptdev, GPU_COHERENCY_FEATURES) & + GPU_COHERENCY_PROT_BIT(ACE_LITE))) + return 0; + + drm_err(&ptdev->base, "Coherency not supported by the device"); + return -ENOTSUPP; +} + static int panthor_clk_init(struct panthor_device *ptdev) { ptdev->clks.core = devm_clk_get(ptdev->base.dev, NULL); @@ -156,7 +174,9 @@ int panthor_device_init(struct panthor_device *ptdev) struct page *p; int ret; - ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT; + ret = panthor_gpu_coherency_init(ptdev); + if (ret) + return ret; init_completion(&ptdev->unplug.done); ret = drmm_mutex_init(&ptdev->base, &ptdev->unplug.lock); diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index 0b3fbee3d37a..32421bad3097 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1505,7 +1505,6 @@ static const struct drm_driver panthor_drm_driver = { .fops = &panthor_drm_driver_fops, .name = "panthor", .desc = "Panthor DRM driver", - .date = "20230801", .major = 1, .minor = 2, diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c index ecca5565ce41..4a2e36504fea 100644 --- a/drivers/gpu/drm/panthor/panthor_fw.c +++ b/drivers/gpu/drm/panthor/panthor_fw.c @@ -91,26 +91,26 @@ enum panthor_fw_binary_entry_type { #define CSF_FW_BINARY_ENTRY_UPDATE BIT(30) #define CSF_FW_BINARY_ENTRY_OPTIONAL BIT(31) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_RD BIT(0) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_WR BIT(1) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_EX BIT(2) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_NONE (0 << 3) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_CACHED (1 << 3) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_UNCACHED_COHERENT (2 << 3) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_CACHED_COHERENT (3 << 3) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_MASK GENMASK(4, 3) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_PROT BIT(5) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_SHARED BIT(30) -#define CSF_FW_BINARY_IFACE_ENTRY_RD_ZERO BIT(31) - -#define CSF_FW_BINARY_IFACE_ENTRY_RD_SUPPORTED_FLAGS \ - (CSF_FW_BINARY_IFACE_ENTRY_RD_RD | \ - CSF_FW_BINARY_IFACE_ENTRY_RD_WR | \ - CSF_FW_BINARY_IFACE_ENTRY_RD_EX | \ - CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_MASK | \ - CSF_FW_BINARY_IFACE_ENTRY_RD_PROT | \ - CSF_FW_BINARY_IFACE_ENTRY_RD_SHARED | \ - CSF_FW_BINARY_IFACE_ENTRY_RD_ZERO) +#define CSF_FW_BINARY_IFACE_ENTRY_RD BIT(0) +#define CSF_FW_BINARY_IFACE_ENTRY_WR BIT(1) +#define CSF_FW_BINARY_IFACE_ENTRY_EX BIT(2) +#define CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_NONE (0 << 3) +#define CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_CACHED (1 << 3) +#define CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_UNCACHED_COHERENT (2 << 3) +#define CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_CACHED_COHERENT (3 << 3) +#define CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_MASK GENMASK(4, 3) +#define CSF_FW_BINARY_IFACE_ENTRY_PROT BIT(5) +#define CSF_FW_BINARY_IFACE_ENTRY_SHARED BIT(30) +#define CSF_FW_BINARY_IFACE_ENTRY_ZERO BIT(31) + +#define CSF_FW_BINARY_IFACE_ENTRY_SUPPORTED_FLAGS \ + (CSF_FW_BINARY_IFACE_ENTRY_RD | \ + CSF_FW_BINARY_IFACE_ENTRY_WR | \ + CSF_FW_BINARY_IFACE_ENTRY_EX | \ + CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_MASK | \ + CSF_FW_BINARY_IFACE_ENTRY_PROT | \ + CSF_FW_BINARY_IFACE_ENTRY_SHARED | \ + CSF_FW_BINARY_IFACE_ENTRY_ZERO) /** * struct panthor_fw_binary_section_entry_hdr - Describes a section of FW binary @@ -413,7 +413,7 @@ static void panthor_fw_init_section_mem(struct panthor_device *ptdev, int ret; if (!section->data.size && - !(section->flags & CSF_FW_BINARY_IFACE_ENTRY_RD_ZERO)) + !(section->flags & CSF_FW_BINARY_IFACE_ENTRY_ZERO)) return; ret = panthor_kernel_bo_vmap(section->mem); @@ -421,7 +421,7 @@ static void panthor_fw_init_section_mem(struct panthor_device *ptdev, return; memcpy(section->mem->kmap, section->data.buf, section->data.size); - if (section->flags & CSF_FW_BINARY_IFACE_ENTRY_RD_ZERO) { + if (section->flags & CSF_FW_BINARY_IFACE_ENTRY_ZERO) { memset(section->mem->kmap + section->data.size, 0, panthor_kernel_bo_size(section->mem) - section->data.size); } @@ -535,20 +535,20 @@ static int panthor_fw_load_section_entry(struct panthor_device *ptdev, return -EINVAL; } - if (hdr.flags & ~CSF_FW_BINARY_IFACE_ENTRY_RD_SUPPORTED_FLAGS) { + if (hdr.flags & ~CSF_FW_BINARY_IFACE_ENTRY_SUPPORTED_FLAGS) { drm_err(&ptdev->base, "Firmware contains interface with unsupported flags (0x%x)\n", hdr.flags); return -EINVAL; } - if (hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_RD_PROT) { + if (hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_PROT) { drm_warn(&ptdev->base, "Firmware protected mode entry not be supported, ignoring"); return 0; } if (hdr.va.start == CSF_MCU_SHARED_REGION_START && - !(hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_RD_SHARED)) { + !(hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_SHARED)) { drm_err(&ptdev->base, "Interface at 0x%llx must be shared", CSF_MCU_SHARED_REGION_START); return -EINVAL; @@ -587,26 +587,26 @@ static int panthor_fw_load_section_entry(struct panthor_device *ptdev, section_size = hdr.va.end - hdr.va.start; if (section_size) { - u32 cache_mode = hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_MASK; + u32 cache_mode = hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_MASK; struct panthor_gem_object *bo; u32 vm_map_flags = 0; struct sg_table *sgt; u64 va = hdr.va.start; - if (!(hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_RD_WR)) + if (!(hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_WR)) vm_map_flags |= DRM_PANTHOR_VM_BIND_OP_MAP_READONLY; - if (!(hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_RD_EX)) + if (!(hdr.flags & CSF_FW_BINARY_IFACE_ENTRY_EX)) vm_map_flags |= DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC; - /* TODO: CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_*_COHERENT are mapped to + /* TODO: CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_*_COHERENT are mapped to * non-cacheable for now. We might want to introduce a new * IOMMU_xxx flag (or abuse IOMMU_MMIO, which maps to device * memory and is currently not used by our driver) for * AS_MEMATTR_AARCH64_SHARED memory, so we can take benefit * of IO-coherent systems. */ - if (cache_mode != CSF_FW_BINARY_IFACE_ENTRY_RD_CACHE_MODE_CACHED) + if (cache_mode != CSF_FW_BINARY_IFACE_ENTRY_CACHE_MODE_CACHED) vm_map_flags |= DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED; section->mem = panthor_kernel_bo_create(ptdev, panthor_fw_vm(ptdev), @@ -619,7 +619,7 @@ static int panthor_fw_load_section_entry(struct panthor_device *ptdev, if (drm_WARN_ON(&ptdev->base, section->mem->va_node.start != hdr.va.start)) return -EINVAL; - if (section->flags & CSF_FW_BINARY_IFACE_ENTRY_RD_SHARED) { + if (section->flags & CSF_FW_BINARY_IFACE_ENTRY_SHARED) { ret = panthor_kernel_bo_vmap(section->mem); if (ret) return ret; @@ -689,7 +689,7 @@ panthor_reload_fw_sections(struct panthor_device *ptdev, bool full_reload) list_for_each_entry(section, &ptdev->fw->sections, node) { struct sg_table *sgt; - if (!full_reload && !(section->flags & CSF_FW_BINARY_IFACE_ENTRY_RD_WR)) + if (!full_reload && !(section->flags & CSF_FW_BINARY_IFACE_ENTRY_WR)) continue; panthor_fw_init_section_mem(ptdev, section); @@ -1098,17 +1098,11 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang) panthor_fw_update_reqs(glb_iface, req, GLB_HALT, GLB_HALT); gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1); if (!readl_poll_timeout(ptdev->iomem + MCU_STATUS, status, - status == MCU_STATUS_HALT, 10, 100000) && - glb_iface->output->halt_status == PANTHOR_FW_HALT_OK) { + status == MCU_STATUS_HALT, 10, 100000)) { ptdev->fw->fast_reset = true; } else { drm_warn(&ptdev->base, "Failed to cleanly suspend MCU"); } - - /* The FW detects 0 -> 1 transitions. Make sure we reset - * the HALT bit before the FW is rebooted. - */ - panthor_fw_update_reqs(glb_iface, req, 0, GLB_HALT); } panthor_job_irq_suspend(&ptdev->fw->irq); @@ -1134,6 +1128,14 @@ int panthor_fw_post_reset(struct panthor_device *ptdev) * the FW sections. If it fails, go for a full reset. */ if (ptdev->fw->fast_reset) { + /* The FW detects 0 -> 1 transitions. Make sure we reset + * the HALT bit before the FW is rebooted. + * This is not needed on a slow reset because FW sections are + * re-initialized. + */ + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev); + panthor_fw_update_reqs(glb_iface, req, 0, GLB_HALT); + ret = panthor_fw_start(ptdev); if (!ret) goto out; diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c index 2d3529a0b156..0f3cac6ec88e 100644 --- a/drivers/gpu/drm/panthor/panthor_gpu.c +++ b/drivers/gpu/drm/panthor/panthor_gpu.c @@ -77,6 +77,12 @@ static const struct panthor_model gpu_models[] = { GPU_IRQ_RESET_COMPLETED | \ GPU_IRQ_CLEAN_CACHES_COMPLETED) +static void panthor_gpu_coherency_set(struct panthor_device *ptdev) +{ + gpu_write(ptdev, GPU_COHERENCY_PROTOCOL, + ptdev->coherent ? GPU_COHERENCY_PROT_BIT(ACE_LITE) : GPU_COHERENCY_NONE); +} + static void panthor_gpu_init_info(struct panthor_device *ptdev) { const struct panthor_model *model; @@ -365,6 +371,9 @@ int panthor_gpu_l2_power_on(struct panthor_device *ptdev) hweight64(ptdev->gpu_info.shader_present)); } + /* Set the desired coherency mode before the power up of L2 */ + panthor_gpu_coherency_set(ptdev); + return panthor_gpu_power_on(ptdev, L2, 1, 20000); } diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c index a49132f3778b..c3f0b0225cf9 100644 --- a/drivers/gpu/drm/panthor/panthor_mmu.c +++ b/drivers/gpu/drm/panthor/panthor_mmu.c @@ -1941,7 +1941,7 @@ struct panthor_heap_pool *panthor_vm_get_heap_pool(struct panthor_vm *vm, bool c return pool; } -static u64 mair_to_memattr(u64 mair) +static u64 mair_to_memattr(u64 mair, bool coherent) { u64 memattr = 0; u32 i; @@ -1960,14 +1960,21 @@ static u64 mair_to_memattr(u64 mair) AS_MEMATTR_AARCH64_SH_MIDGARD_INNER | AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(false, false); } else { - /* Use SH_CPU_INNER mode so SH_IS, which is used when - * IOMMU_CACHE is set, actually maps to the standard - * definition of inner-shareable and not Mali's - * internal-shareable mode. - */ out_attr = AS_MEMATTR_AARCH64_INNER_OUTER_WB | - AS_MEMATTR_AARCH64_SH_CPU_INNER | AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(inner & 1, inner & 2); + /* Use SH_MIDGARD_INNER mode when device isn't coherent, + * so SH_IS, which is used when IOMMU_CACHE is set, maps + * to Mali's internal-shareable mode. As per the Mali + * Spec, inner and outer-shareable modes aren't allowed + * for WB memory when coherency is disabled. + * Use SH_CPU_INNER mode when coherency is enabled, so + * that SH_IS actually maps to the standard definition of + * inner-shareable. + */ + if (!coherent) + out_attr |= AS_MEMATTR_AARCH64_SH_MIDGARD_INNER; + else + out_attr |= AS_MEMATTR_AARCH64_SH_CPU_INNER; } memattr |= (u64)out_attr << (8 * i); @@ -2339,7 +2346,7 @@ panthor_vm_create(struct panthor_device *ptdev, bool for_mcu, goto err_sched_fini; mair = io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg.arm_lpae_s1_cfg.mair; - vm->memattr = mair_to_memattr(mair); + vm->memattr = mair_to_memattr(mair, ptdev->coherent); mutex_lock(&ptdev->mmu->vm.lock); list_add_tail(&vm->node, &ptdev->mmu->vm.list); diff --git a/drivers/gpu/drm/pl111/pl111_drv.c b/drivers/gpu/drm/pl111/pl111_drv.c index 13362150b9c6..56ff6a3fb483 100644 --- a/drivers/gpu/drm/pl111/pl111_drv.c +++ b/drivers/gpu/drm/pl111/pl111_drv.c @@ -45,9 +45,9 @@ #include <linux/shmem_fs.h> #include <linux/slab.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -220,7 +220,6 @@ static const struct drm_driver pl111_drm_driver = { .fops = &drm_fops, .name = "pl111", .desc = DRIVER_DESC, - .date = "20170317", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/qxl/Kconfig b/drivers/gpu/drm/qxl/Kconfig index 98a148bea628..69427eb8bed2 100644 --- a/drivers/gpu/drm/qxl/Kconfig +++ b/drivers/gpu/drm/qxl/Kconfig @@ -6,6 +6,7 @@ config DRM_QXL select DRM_KMS_HELPER select DRM_TTM select DRM_TTM_HELPER + select DRM_EXEC select CRC32 help QXL virtual GPU for Spice virtualization desktop integration. diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c index 21f752644242..417061ae59eb 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.c +++ b/drivers/gpu/drm/qxl/qxl_drv.c @@ -34,9 +34,9 @@ #include <linux/pci.h> #include <linux/vgaarb.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_file.h> @@ -300,7 +300,6 @@ static struct drm_driver qxl_driver = { .num_ioctls = ARRAY_SIZE(qxl_ioctls), .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = 0, .minor = 1, .patchlevel = 0, diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h index 32069acd93f8..cc02b5f10ad9 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.h +++ b/drivers/gpu/drm/qxl/qxl_drv.h @@ -38,12 +38,12 @@ #include <drm/drm_crtc.h> #include <drm/drm_encoder.h> +#include <drm/drm_exec.h> #include <drm/drm_gem_ttm_helper.h> #include <drm/drm_ioctl.h> #include <drm/drm_gem.h> #include <drm/qxl_drm.h> #include <drm/ttm/ttm_bo.h> -#include <drm/ttm/ttm_execbuf_util.h> #include <drm/ttm/ttm_placement.h> #include "qxl_dev.h" @@ -54,7 +54,6 @@ struct iosys_map; #define DRIVER_NAME "qxl" #define DRIVER_DESC "RH QXL" -#define DRIVER_DATE "20120117" #define DRIVER_MAJOR 0 #define DRIVER_MINOR 1 @@ -101,7 +100,8 @@ struct qxl_gem { }; struct qxl_bo_list { - struct ttm_validate_buffer tv; + struct qxl_bo *bo; + struct list_head list; }; struct qxl_crtc { @@ -150,7 +150,7 @@ struct qxl_release { struct qxl_bo *release_bo; uint32_t release_offset; uint32_t surface_release_id; - struct ww_acquire_ctx ticket; + struct drm_exec exec; struct list_head bos; }; diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index 368d26da0d6a..05204a6a3fa8 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -121,13 +121,11 @@ qxl_release_free_list(struct qxl_release *release) { while (!list_empty(&release->bos)) { struct qxl_bo_list *entry; - struct qxl_bo *bo; entry = container_of(release->bos.next, - struct qxl_bo_list, tv.head); - bo = to_qxl_bo(entry->tv.bo); - qxl_bo_unref(&bo); - list_del(&entry->tv.head); + struct qxl_bo_list, list); + qxl_bo_unref(&entry->bo); + list_del(&entry->list); kfree(entry); } release->release_bo = NULL; @@ -172,8 +170,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo) { struct qxl_bo_list *entry; - list_for_each_entry(entry, &release->bos, tv.head) { - if (entry->tv.bo == &bo->tbo) + list_for_each_entry(entry, &release->bos, list) { + if (entry->bo == bo) return 0; } @@ -182,9 +180,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo) return -ENOMEM; qxl_bo_ref(bo); - entry->tv.bo = &bo->tbo; - entry->tv.num_shared = 0; - list_add_tail(&entry->tv.head, &release->bos); + entry->bo = bo; + list_add_tail(&entry->list, &release->bos); return 0; } @@ -221,21 +218,28 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr) if (list_is_singular(&release->bos)) return 0; - ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos, - !no_intr, NULL); - if (ret) - return ret; - - list_for_each_entry(entry, &release->bos, tv.head) { - struct qxl_bo *bo = to_qxl_bo(entry->tv.bo); - - ret = qxl_release_validate_bo(bo); - if (ret) { - ttm_eu_backoff_reservation(&release->ticket, &release->bos); - return ret; + drm_exec_init(&release->exec, no_intr ? 0 : + DRM_EXEC_INTERRUPTIBLE_WAIT, 0); + drm_exec_until_all_locked(&release->exec) { + list_for_each_entry(entry, &release->bos, list) { + ret = drm_exec_prepare_obj(&release->exec, + &entry->bo->tbo.base, + 1); + drm_exec_retry_on_contention(&release->exec); + if (ret) + goto error; } } + + list_for_each_entry(entry, &release->bos, list) { + ret = qxl_release_validate_bo(entry->bo); + if (ret) + goto error; + } return 0; +error: + drm_exec_fini(&release->exec); + return ret; } void qxl_release_backoff_reserve_list(struct qxl_release *release) @@ -245,7 +249,7 @@ void qxl_release_backoff_reserve_list(struct qxl_release *release) if (list_is_singular(&release->bos)) return; - ttm_eu_backoff_reservation(&release->ticket, &release->bos); + drm_exec_fini(&release->exec); } int qxl_alloc_surface_release_reserved(struct qxl_device *qdev, @@ -404,18 +408,18 @@ void qxl_release_unmap(struct qxl_device *qdev, void qxl_release_fence_buffer_objects(struct qxl_release *release) { - struct ttm_buffer_object *bo; struct ttm_device *bdev; - struct ttm_validate_buffer *entry; + struct qxl_bo_list *entry; struct qxl_device *qdev; + struct qxl_bo *bo; /* if only one object on the release its the release itself since these objects are pinned no need to reserve */ if (list_is_singular(&release->bos) || list_empty(&release->bos)) return; - bo = list_first_entry(&release->bos, struct ttm_validate_buffer, head)->bo; - bdev = bo->bdev; + bo = list_first_entry(&release->bos, struct qxl_bo_list, list)->bo; + bdev = bo->tbo.bdev; qdev = container_of(bdev, struct qxl_device, mman.bdev); /* @@ -426,14 +430,12 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release) release->id | 0xf0000000, release->base.seqno); trace_dma_fence_emit(&release->base); - list_for_each_entry(entry, &release->bos, head) { + list_for_each_entry(entry, &release->bos, list) { bo = entry->bo; - dma_resv_add_fence(bo->base.resv, &release->base, + dma_resv_add_fence(bo->tbo.base.resv, &release->base, DMA_RESV_USAGE_READ); - ttm_bo_move_to_lru_tail_unlocked(bo); - dma_resv_unlock(bo->base.resv); + ttm_bo_move_to_lru_tail_unlocked(&bo->tbo); } - ww_acquire_fini(&release->ticket); + drm_exec_fini(&release->exec); } - diff --git a/drivers/gpu/drm/radeon/Kconfig b/drivers/gpu/drm/radeon/Kconfig index 9c6c74a75778..f51bace9555d 100644 --- a/drivers/gpu/drm/radeon/Kconfig +++ b/drivers/gpu/drm/radeon/Kconfig @@ -13,6 +13,7 @@ config DRM_RADEON select DRM_TTM select DRM_TTM_HELPER select FB_IOMEM_HELPERS if DRM_FBDEV_EMULATION + select DRM_EXEC select SND_HDA_COMPONENT if SND_HDA_CORE select POWER_SUPPLY select HWMON diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index fd8a4513025f..8605c074d9f7 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -75,8 +75,8 @@ #include <drm/ttm/ttm_bo.h> #include <drm/ttm/ttm_placement.h> -#include <drm/ttm/ttm_execbuf_util.h> +#include <drm/drm_exec.h> #include <drm/drm_gem.h> #include <drm/drm_audio_component.h> #include <drm/drm_suballoc.h> @@ -457,7 +457,8 @@ struct radeon_mman { struct radeon_bo_list { struct radeon_bo *robj; - struct ttm_validate_buffer tv; + struct list_head list; + bool shared; uint64_t gpu_offset; unsigned preferred_domains; unsigned allowed_domains; @@ -1030,6 +1031,7 @@ struct radeon_cs_parser { struct radeon_bo_list *vm_bos; struct list_head validated; unsigned dma_reloc_idx; + struct drm_exec exec; /* indices of various chunks */ struct radeon_cs_chunk *chunk_ib; struct radeon_cs_chunk *chunk_relocs; @@ -1043,7 +1045,6 @@ struct radeon_cs_parser { u32 cs_flags; u32 ring; s32 priority; - struct ww_acquire_ctx ticket; }; static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index a6700d7278bf..64b26bfeafc9 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -182,11 +182,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) } } - p->relocs[i].tv.bo = &p->relocs[i].robj->tbo; - p->relocs[i].tv.num_shared = !r->write_domain; - - radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head, - priority); + p->relocs[i].shared = !r->write_domain; + radeon_cs_buckets_add(&buckets, &p->relocs[i].list, priority); } radeon_cs_buckets_get_list(&buckets, &p->validated); @@ -197,7 +194,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) if (need_mmap_lock) mmap_read_lock(current->mm); - r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring); + r = radeon_bo_list_validate(p->rdev, &p->exec, &p->validated, p->ring); if (need_mmap_lock) mmap_read_unlock(current->mm); @@ -253,12 +250,11 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) struct radeon_bo_list *reloc; int r; - list_for_each_entry(reloc, &p->validated, tv.head) { + list_for_each_entry(reloc, &p->validated, list) { struct dma_resv *resv; resv = reloc->robj->tbo.base.resv; - r = radeon_sync_resv(p->rdev, &p->ib.sync, resv, - reloc->tv.num_shared); + r = radeon_sync_resv(p->rdev, &p->ib.sync, resv, reloc->shared); if (r) return r; } @@ -276,6 +272,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) s32 priority = 0; INIT_LIST_HEAD(&p->validated); + drm_exec_init(&p->exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0); if (!cs->num_chunks) { return 0; @@ -397,8 +394,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) static int cmp_size_smaller_first(void *priv, const struct list_head *a, const struct list_head *b) { - struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head); - struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head); + struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, list); + struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, list); /* Sort A before B if A is smaller. */ if (la->robj->tbo.base.size > lb->robj->tbo.base.size) @@ -417,11 +414,13 @@ static int cmp_size_smaller_first(void *priv, const struct list_head *a, * If error is set than unvalidate buffer, otherwise just free memory * used by parsing context. **/ -static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bool backoff) +static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error) { unsigned i; if (!error) { + struct radeon_bo_list *reloc; + /* Sort the buffer list from the smallest to largest buffer, * which affects the order of buffers in the LRU list. * This assures that the smallest buffers are added first @@ -433,15 +432,17 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo * per frame under memory pressure. */ list_sort(NULL, &parser->validated, cmp_size_smaller_first); - - ttm_eu_fence_buffer_objects(&parser->ticket, - &parser->validated, - &parser->ib.fence->base); - } else if (backoff) { - ttm_eu_backoff_reservation(&parser->ticket, - &parser->validated); + list_for_each_entry(reloc, &parser->validated, list) { + dma_resv_add_fence(reloc->robj->tbo.base.resv, + &parser->ib.fence->base, + reloc->shared ? + DMA_RESV_USAGE_READ : + DMA_RESV_USAGE_WRITE); + } } + drm_exec_fini(&parser->exec); + if (parser->relocs != NULL) { for (i = 0; i < parser->nrelocs; i++) { struct radeon_bo *bo = parser->relocs[i].robj; @@ -693,7 +694,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) r = radeon_cs_parser_init(&parser, data); if (r) { DRM_ERROR("Failed to initialize parser !\n"); - radeon_cs_parser_fini(&parser, r, false); + radeon_cs_parser_fini(&parser, r); up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; @@ -707,7 +708,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } if (r) { - radeon_cs_parser_fini(&parser, r, false); + radeon_cs_parser_fini(&parser, r); up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; @@ -724,7 +725,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) goto out; } out: - radeon_cs_parser_fini(&parser, r, true); + radeon_cs_parser_fini(&parser, r); up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 5e958cc223f4..267f082bc430 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -37,7 +37,7 @@ #include <linux/mmu_notifier.h> #include <linux/pci.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_fourcc.h> @@ -603,7 +603,6 @@ static const struct drm_driver kms_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = KMS_DRIVER_MAJOR, .minor = KMS_DRIVER_MINOR, .patchlevel = KMS_DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/radeon/radeon_drv.h b/drivers/gpu/drm/radeon/radeon_drv.h index 02a65971d140..0f3dbffc492d 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.h +++ b/drivers/gpu/drm/radeon/radeon_drv.h @@ -43,7 +43,6 @@ #define DRIVER_NAME "radeon" #define DRIVER_DESC "ATI Radeon" -#define DRIVER_DATE "20080528" /* Interface history: * diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index bf2d4b16dc2a..f86773f3db20 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -605,33 +605,40 @@ out: static void radeon_gem_va_update_vm(struct radeon_device *rdev, struct radeon_bo_va *bo_va) { - struct ttm_validate_buffer tv, *entry; - struct radeon_bo_list *vm_bos; - struct ww_acquire_ctx ticket; + struct radeon_bo_list *vm_bos, *entry; struct list_head list; + struct drm_exec exec; unsigned domain; int r; INIT_LIST_HEAD(&list); - tv.bo = &bo_va->bo->tbo; - tv.num_shared = 1; - list_add(&tv.head, &list); - vm_bos = radeon_vm_get_bos(rdev, bo_va->vm, &list); if (!vm_bos) return; - r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL); - if (r) - goto error_free; + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0); + drm_exec_until_all_locked(&exec) { + list_for_each_entry(entry, &list, list) { + r = drm_exec_prepare_obj(&exec, &entry->robj->tbo.base, + 1); + drm_exec_retry_on_contention(&exec); + if (unlikely(r)) + goto error_cleanup; + } - list_for_each_entry(entry, &list, head) { - domain = radeon_mem_type_to_domain(entry->bo->resource->mem_type); + r = drm_exec_prepare_obj(&exec, &bo_va->bo->tbo.base, 1); + drm_exec_retry_on_contention(&exec); + if (unlikely(r)) + goto error_cleanup; + } + + list_for_each_entry(entry, &list, list) { + domain = radeon_mem_type_to_domain(entry->robj->tbo.resource->mem_type); /* if anything is swapped out don't swap it in here, just abort and wait for the next CS */ if (domain == RADEON_GEM_DOMAIN_CPU) - goto error_unreserve; + goto error_cleanup; } mutex_lock(&bo_va->vm->mutex); @@ -645,10 +652,8 @@ static void radeon_gem_va_update_vm(struct radeon_device *rdev, error_unlock: mutex_unlock(&bo_va->vm->mutex); -error_unreserve: - ttm_eu_backoff_reservation(&ticket, &list); - -error_free: +error_cleanup: + drm_exec_fini(&exec); kvfree(vm_bos); if (r && r != -ERESTARTSYS) diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 7672404fdb29..a0fc0801abb0 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -464,23 +464,26 @@ static u64 radeon_bo_get_threshold_for_moves(struct radeon_device *rdev) } int radeon_bo_list_validate(struct radeon_device *rdev, - struct ww_acquire_ctx *ticket, + struct drm_exec *exec, struct list_head *head, int ring) { struct ttm_operation_ctx ctx = { true, false }; struct radeon_bo_list *lobj; - struct list_head duplicates; - int r; u64 bytes_moved = 0, initial_bytes_moved; u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev); + int r; - INIT_LIST_HEAD(&duplicates); - r = ttm_eu_reserve_buffers(ticket, head, true, &duplicates); - if (unlikely(r != 0)) { - return r; + drm_exec_until_all_locked(exec) { + list_for_each_entry(lobj, head, list) { + r = drm_exec_prepare_obj(exec, &lobj->robj->tbo.base, + 1); + drm_exec_retry_on_contention(exec); + if (unlikely(r && r != -EALREADY)) + return r; + } } - list_for_each_entry(lobj, head, tv.head) { + list_for_each_entry(lobj, head, list) { struct radeon_bo *bo = lobj->robj; if (!bo->tbo.pin_count) { u32 domain = lobj->preferred_domains; @@ -519,7 +522,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev, domain = lobj->allowed_domains; goto retry; } - ttm_eu_backoff_reservation(ticket, head); return r; } } @@ -527,11 +529,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev, lobj->tiling_flags = bo->tiling_flags; } - list_for_each_entry(lobj, &duplicates, tv.head) { - lobj->gpu_offset = radeon_bo_gpu_offset(lobj->robj); - lobj->tiling_flags = lobj->robj->tiling_flags; - } - return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 39cc87a59a9a..d7bbb52db546 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -152,7 +152,7 @@ extern void radeon_bo_force_delete(struct radeon_device *rdev); extern int radeon_bo_init(struct radeon_device *rdev); extern void radeon_bo_fini(struct radeon_device *rdev); extern int radeon_bo_list_validate(struct radeon_device *rdev, - struct ww_acquire_ctx *ticket, + struct drm_exec *exec, struct list_head *head, int ring); extern int radeon_bo_set_tiling_flags(struct radeon_bo *bo, u32 tiling_flags, u32 pitch); diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c index c38b4d5d6a14..21a5340aefdf 100644 --- a/drivers/gpu/drm/radeon/radeon_vm.c +++ b/drivers/gpu/drm/radeon/radeon_vm.c @@ -142,10 +142,9 @@ struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev, list[0].robj = vm->page_directory; list[0].preferred_domains = RADEON_GEM_DOMAIN_VRAM; list[0].allowed_domains = RADEON_GEM_DOMAIN_VRAM; - list[0].tv.bo = &vm->page_directory->tbo; - list[0].tv.num_shared = 1; + list[0].shared = true; list[0].tiling_flags = 0; - list_add(&list[0].tv.head, head); + list_add(&list[0].list, head); for (i = 0, idx = 1; i <= vm->max_pde_used; i++) { if (!vm->page_tables[i].bo) @@ -154,10 +153,9 @@ struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev, list[idx].robj = vm->page_tables[i].bo; list[idx].preferred_domains = RADEON_GEM_DOMAIN_VRAM; list[idx].allowed_domains = RADEON_GEM_DOMAIN_VRAM; - list[idx].tv.bo = &list[idx].robj->tbo; - list[idx].tv.num_shared = 1; + list[idx].shared = true; list[idx].tiling_flags = 0; - list_add(&list[idx++].tv.head, head); + list_add(&list[idx++].list, head); } return list; diff --git a/drivers/gpu/drm/renesas/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/renesas/rcar-du/rcar_du_drv.c index f9ecc334c024..d988433c8d3e 100644 --- a/drivers/gpu/drm/renesas/rcar-du/rcar_du_drv.c +++ b/drivers/gpu/drm/renesas/rcar-du/rcar_du_drv.c @@ -18,8 +18,8 @@ #include <linux/slab.h> #include <linux/wait.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -611,7 +611,6 @@ static const struct drm_driver rcar_du_driver = { .fops = &rcar_du_fops, .name = "rcar-du", .desc = "Renesas R-Car Display Unit", - .date = "20130110", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_drv.c b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_drv.c index b069efd8ffc3..cbd9b9841267 100644 --- a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_drv.c +++ b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_drv.c @@ -12,8 +12,8 @@ #include <linux/of.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -84,7 +84,6 @@ static const struct drm_driver rzg2l_du_driver = { .fops = &rzg2l_du_fops, .name = "rzg2l-du", .desc = "Renesas RZ/G2L Display Unit", - .date = "20230410", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c index b99217b4e05d..90c6269ccd29 100644 --- a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c +++ b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c @@ -311,11 +311,11 @@ int rzg2l_du_modeset_init(struct rzg2l_du_device *rcdu) dev->mode_config.helper_private = &rzg2l_du_mode_config_helper; /* - * The RZ DU uses the VSP1 for memory access, and is limited - * to frame sizes of 1920x1080. + * The RZ DU was designed to support a frame size of 1920x1200 (landscape) + * or 1200x1920 (portrait). */ dev->mode_config.max_width = 1920; - dev->mode_config.max_height = 1080; + dev->mode_config.max_height = 1920; rcdu->num_crtcs = hweight8(rcdu->info->channels_mask); diff --git a/drivers/gpu/drm/renesas/shmobile/shmob_drm_drv.c b/drivers/gpu/drm/renesas/shmobile/shmob_drm_drv.c index 76ee3e16077c..2f31822b2245 100644 --- a/drivers/gpu/drm/renesas/shmobile/shmob_drm_drv.c +++ b/drivers/gpu/drm/renesas/shmobile/shmob_drm_drv.c @@ -17,8 +17,8 @@ #include <linux/pm_runtime.h> #include <linux/slab.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -107,7 +107,6 @@ static const struct drm_driver shmob_drm_driver = { .fops = &shmob_drm_fops, .name = "shmob-drm", .desc = "Renesas SH Mobile DRM", - .date = "20120424", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c index 546d13f19f9b..cfcb45e6421c 100644 --- a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c +++ b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c @@ -386,7 +386,7 @@ static int rockchip_dp_probe(struct platform_device *pdev) return -ENODEV; ret = drm_of_find_panel_or_bridge(dev->of_node, 1, 0, &panel, NULL); - if (ret < 0) + if (ret < 0 && ret != -ENODEV) return ret; dp = devm_kzalloc(dev, sizeof(*dp), GFP_KERNEL); diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c b/drivers/gpu/drm/rockchip/cdn-dp-core.c index ff9d95e2c4d4..a7891a139c88 100644 --- a/drivers/gpu/drm/rockchip/cdn-dp-core.c +++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c @@ -947,9 +947,6 @@ static void cdn_dp_pd_event_work(struct work_struct *work) { struct cdn_dp_device *dp = container_of(work, struct cdn_dp_device, event_work); - struct drm_connector *connector = &dp->connector; - enum drm_connector_status old_status; - int ret; mutex_lock(&dp->lock); @@ -1009,11 +1006,7 @@ static void cdn_dp_pd_event_work(struct work_struct *work) out: mutex_unlock(&dp->lock); - - old_status = connector->status; - connector->status = connector->funcs->detect(connector, false); - if (old_status != connector->status) - drm_kms_helper_hotplug_event(dp->drm_dev); + drm_connector_helper_hpd_irq_event(&dp->connector); } static int cdn_dp_pd_event(struct notifier_block *nb, diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c index 32d8394c4c49..5b24ee93635a 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c @@ -17,7 +17,7 @@ #include <linux/console.h> #include <linux/iommu.h> -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -39,7 +39,6 @@ #define DRIVER_NAME "rockchip" #define DRIVER_DESC "RockChip Soc DRM" -#define DRIVER_DATE "20140818" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -235,7 +234,6 @@ static const struct drm_driver rockchip_drm_driver = { .fops = &rockchip_drm_driver_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c index 9873172e3fd3..0c8ec7220fbe 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c @@ -278,6 +278,15 @@ static u32 vop2_readl(struct vop2 *vop2, u32 offset) return val; } +static u32 vop2_vp_read(struct vop2_video_port *vp, u32 offset) +{ + u32 val; + + regmap_read(vp->vop2->map, vp->data->offset + offset, &val); + + return val; +} + static void vop2_win_write(const struct vop2_win *win, unsigned int reg, u32 v) { regmap_field_write(win->reg[reg], v); @@ -998,6 +1007,67 @@ static void vop2_disable(struct vop2 *vop2) clk_disable_unprepare(vop2->hclk); } +static bool vop2_vp_dsp_lut_is_enabled(struct vop2_video_port *vp) +{ + u32 dsp_ctrl = vop2_vp_read(vp, RK3568_VP_DSP_CTRL); + + return dsp_ctrl & RK3568_VP_DSP_CTRL__DSP_LUT_EN; +} + +static void vop2_vp_dsp_lut_disable(struct vop2_video_port *vp) +{ + u32 dsp_ctrl = vop2_vp_read(vp, RK3568_VP_DSP_CTRL); + + dsp_ctrl &= ~RK3568_VP_DSP_CTRL__DSP_LUT_EN; + vop2_vp_write(vp, RK3568_VP_DSP_CTRL, dsp_ctrl); +} + +static bool vop2_vp_dsp_lut_poll_disabled(struct vop2_video_port *vp) +{ + u32 dsp_ctrl; + int ret = readx_poll_timeout(vop2_vp_dsp_lut_is_enabled, vp, dsp_ctrl, + !dsp_ctrl, 5, 30 * 1000); + if (ret) { + drm_err(vp->vop2->drm, "display LUT RAM enable timeout!\n"); + return false; + } + + return true; +} + +static void vop2_vp_dsp_lut_enable(struct vop2_video_port *vp) +{ + u32 dsp_ctrl = vop2_vp_read(vp, RK3568_VP_DSP_CTRL); + + dsp_ctrl |= RK3568_VP_DSP_CTRL__DSP_LUT_EN; + vop2_vp_write(vp, RK3568_VP_DSP_CTRL, dsp_ctrl); +} + +static void vop2_vp_dsp_lut_update_enable(struct vop2_video_port *vp) +{ + u32 dsp_ctrl = vop2_vp_read(vp, RK3568_VP_DSP_CTRL); + + dsp_ctrl |= RK3588_VP_DSP_CTRL__GAMMA_UPDATE_EN; + vop2_vp_write(vp, RK3568_VP_DSP_CTRL, dsp_ctrl); +} + +static inline bool vop2_supports_seamless_gamma_lut_update(struct vop2 *vop2) +{ + return (vop2->data->soc_id != 3566 && vop2->data->soc_id != 3568); +} + +static bool vop2_gamma_lut_in_use(struct vop2 *vop2, struct vop2_video_port *vp) +{ + const int nr_vps = vop2->data->nr_vps; + int gamma_en_vp_id; + + for (gamma_en_vp_id = 0; gamma_en_vp_id < nr_vps; gamma_en_vp_id++) + if (vop2_vp_dsp_lut_is_enabled(&vop2->vps[gamma_en_vp_id])) + break; + + return gamma_en_vp_id != nr_vps && gamma_en_vp_id != vp->id; +} + static void vop2_crtc_atomic_disable(struct drm_crtc *crtc, struct drm_atomic_state *state) { @@ -1271,8 +1341,9 @@ static void vop2_plane_atomic_update(struct drm_plane *plane, dsp_w = drm_rect_width(dest); if (dest->x1 + dsp_w > adjusted_mode->hdisplay) { - drm_err(vop2->drm, "vp%d %s dest->x1[%d] + dsp_w[%d] exceed mode hdisplay[%d]\n", - vp->id, win->data->name, dest->x1, dsp_w, adjusted_mode->hdisplay); + drm_dbg_kms(vop2->drm, + "vp%d %s dest->x1[%d] + dsp_w[%d] exceed mode hdisplay[%d]\n", + vp->id, win->data->name, dest->x1, dsp_w, adjusted_mode->hdisplay); dsp_w = adjusted_mode->hdisplay - dest->x1; if (dsp_w < 4) dsp_w = 4; @@ -1282,8 +1353,9 @@ static void vop2_plane_atomic_update(struct drm_plane *plane, dsp_h = drm_rect_height(dest); if (dest->y1 + dsp_h > adjusted_mode->vdisplay) { - drm_err(vop2->drm, "vp%d %s dest->y1[%d] + dsp_h[%d] exceed mode vdisplay[%d]\n", - vp->id, win->data->name, dest->y1, dsp_h, adjusted_mode->vdisplay); + drm_dbg_kms(vop2->drm, + "vp%d %s dest->y1[%d] + dsp_h[%d] exceed mode vdisplay[%d]\n", + vp->id, win->data->name, dest->y1, dsp_h, adjusted_mode->vdisplay); dsp_h = adjusted_mode->vdisplay - dest->y1; if (dsp_h < 4) dsp_h = 4; @@ -1296,15 +1368,15 @@ static void vop2_plane_atomic_update(struct drm_plane *plane, */ if (!(win->data->feature & WIN_FEATURE_AFBDC)) { if (actual_w > dsp_w && (actual_w & 0xf) == 1) { - drm_err(vop2->drm, "vp%d %s act_w[%d] MODE 16 == 1\n", - vp->id, win->data->name, actual_w); + drm_dbg_kms(vop2->drm, "vp%d %s act_w[%d] MODE 16 == 1\n", + vp->id, win->data->name, actual_w); actual_w -= 1; } } if (afbc_en && actual_w % 4) { - drm_err(vop2->drm, "vp%d %s actual_w[%d] not 4 pixel aligned\n", - vp->id, win->data->name, actual_w); + drm_dbg_kms(vop2->drm, "vp%d %s actual_w[%d] not 4 pixel aligned\n", + vp->id, win->data->name, actual_w); actual_w = ALIGN_DOWN(actual_w, 4); } @@ -1341,8 +1413,8 @@ static void vop2_plane_atomic_update(struct drm_plane *plane, */ stride = (fb->pitches[0] << 3) / bpp; if ((stride & 0x3f) && (xmirror || rotate_90 || rotate_270)) - drm_err(vop2->drm, "vp%d %s stride[%d] not 64 pixel aligned\n", - vp->id, win->data->name, stride); + drm_dbg_kms(vop2->drm, "vp%d %s stride[%d] not 64 pixel aligned\n", + vp->id, win->data->name, stride); uv_swap = vop2_afbc_uv_swap(fb->format->format); /* @@ -1482,6 +1554,77 @@ static bool vop2_crtc_mode_fixup(struct drm_crtc *crtc, return true; } +static void vop2_crtc_write_gamma_lut(struct vop2 *vop2, struct drm_crtc *crtc) +{ + const struct vop2_video_port *vp = to_vop2_video_port(crtc); + const struct vop2_video_port_data *vp_data = &vop2->data->vp[vp->id]; + struct drm_color_lut *lut = crtc->state->gamma_lut->data; + unsigned int i, bpc = ilog2(vp_data->gamma_lut_len); + u32 word; + + for (i = 0; i < crtc->gamma_size; i++) { + word = (drm_color_lut_extract(lut[i].blue, bpc) << (2 * bpc)) | + (drm_color_lut_extract(lut[i].green, bpc) << bpc) | + drm_color_lut_extract(lut[i].red, bpc); + + writel(word, vop2->lut_regs + i * 4); + } +} + +static void vop2_crtc_atomic_set_gamma_seamless(struct vop2 *vop2, + struct vop2_video_port *vp, + struct drm_crtc *crtc) +{ + vop2_writel(vop2, RK3568_LUT_PORT_SEL, + FIELD_PREP(RK3588_LUT_PORT_SEL__GAMMA_AHB_WRITE_SEL, vp->id)); + vop2_vp_dsp_lut_enable(vp); + vop2_crtc_write_gamma_lut(vop2, crtc); + vop2_vp_dsp_lut_update_enable(vp); +} + +static void vop2_crtc_atomic_set_gamma_rk356x(struct vop2 *vop2, + struct vop2_video_port *vp, + struct drm_crtc *crtc) +{ + vop2_vp_dsp_lut_disable(vp); + vop2_cfg_done(vp); + if (!vop2_vp_dsp_lut_poll_disabled(vp)) + return; + + vop2_writel(vop2, RK3568_LUT_PORT_SEL, vp->id); + vop2_crtc_write_gamma_lut(vop2, crtc); + vop2_vp_dsp_lut_enable(vp); +} + +static void vop2_crtc_atomic_try_set_gamma(struct vop2 *vop2, + struct vop2_video_port *vp, + struct drm_crtc *crtc, + struct drm_crtc_state *crtc_state) +{ + if (!vop2->lut_regs || !crtc_state->color_mgmt_changed) + return; + + if (!crtc_state->gamma_lut) { + vop2_vp_dsp_lut_disable(vp); + return; + } + + if (vop2_supports_seamless_gamma_lut_update(vop2)) + vop2_crtc_atomic_set_gamma_seamless(vop2, vp, crtc); + else + vop2_crtc_atomic_set_gamma_rk356x(vop2, vp, crtc); +} + +static inline void vop2_crtc_atomic_try_set_gamma_locked(struct vop2 *vop2, + struct vop2_video_port *vp, + struct drm_crtc *crtc, + struct drm_crtc_state *crtc_state) +{ + vop2_lock(vop2); + vop2_crtc_atomic_try_set_gamma(vop2, vp, crtc, crtc_state); + vop2_unlock(vop2); +} + static void vop2_dither_setup(struct drm_crtc *crtc, u32 *dsp_ctrl) { struct rockchip_crtc_state *vcstate = to_rockchip_crtc_state(crtc->state); @@ -1721,9 +1864,9 @@ static unsigned long rk3588_calc_cru_cfg(struct vop2_video_port *vp, int id, else dclk_out_rate = v_pixclk >> 2; - dclk_rate = rk3588_calc_dclk(dclk_out_rate, 600000); + dclk_rate = rk3588_calc_dclk(dclk_out_rate, 600000000); if (!dclk_rate) { - drm_err(vop2->drm, "DP dclk_out_rate out of range, dclk_out_rate: %ld KHZ\n", + drm_err(vop2->drm, "DP dclk_out_rate out of range, dclk_out_rate: %ld Hz\n", dclk_out_rate); return 0; } @@ -1738,9 +1881,9 @@ static unsigned long rk3588_calc_cru_cfg(struct vop2_video_port *vp, int id, * dclk_rate = N * dclk_core_rate N = (1,2,4 ), * we get a little factor here */ - dclk_rate = rk3588_calc_dclk(dclk_out_rate, 600000); + dclk_rate = rk3588_calc_dclk(dclk_out_rate, 600000000); if (!dclk_rate) { - drm_err(vop2->drm, "MIPI dclk out of range, dclk_out_rate: %ld KHZ\n", + drm_err(vop2->drm, "MIPI dclk out of range, dclk_out_rate: %ld Hz\n", dclk_out_rate); return 0; } @@ -2057,11 +2200,40 @@ static void vop2_crtc_atomic_enable(struct drm_crtc *crtc, vop2_vp_write(vp, RK3568_VP_DSP_CTRL, dsp_ctrl); + vop2_crtc_atomic_try_set_gamma(vop2, vp, crtc, crtc_state); + drm_crtc_vblank_on(crtc); vop2_unlock(vop2); } +static int vop2_crtc_atomic_check_gamma(struct vop2_video_port *vp, + struct drm_crtc *crtc, + struct drm_atomic_state *state, + struct drm_crtc_state *crtc_state) +{ + struct vop2 *vop2 = vp->vop2; + unsigned int len; + + if (!vp->vop2->lut_regs || !crtc_state->color_mgmt_changed || + !crtc_state->gamma_lut) + return 0; + + len = drm_color_lut_size(crtc_state->gamma_lut); + if (len != crtc->gamma_size) { + drm_dbg(vop2->drm, "Invalid LUT size; got %d, expected %d\n", + len, crtc->gamma_size); + return -EINVAL; + } + + if (!vop2_supports_seamless_gamma_lut_update(vop2) && vop2_gamma_lut_in_use(vop2, vp)) { + drm_info(vop2->drm, "Gamma LUT can be enabled for only one CRTC at a time\n"); + return -EINVAL; + } + + return 0; +} + static int vop2_crtc_atomic_check(struct drm_crtc *crtc, struct drm_atomic_state *state) { @@ -2069,6 +2241,11 @@ static int vop2_crtc_atomic_check(struct drm_crtc *crtc, struct drm_plane *plane; int nplanes = 0; struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc); + int ret; + + ret = vop2_crtc_atomic_check_gamma(vp, crtc, state, crtc_state); + if (ret) + return ret; drm_atomic_crtc_state_for_each_plane(plane, crtc_state) nplanes++; @@ -2487,7 +2664,13 @@ static void vop2_crtc_atomic_begin(struct drm_crtc *crtc, static void vop2_crtc_atomic_flush(struct drm_crtc *crtc, struct drm_atomic_state *state) { + struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc); struct vop2_video_port *vp = to_vop2_video_port(crtc); + struct vop2 *vop2 = vp->vop2; + + /* In case of modeset, gamma lut update already happened in atomic enable */ + if (!drm_atomic_crtc_needs_modeset(crtc_state)) + vop2_crtc_atomic_try_set_gamma_locked(vop2, vp, crtc, crtc_state); vop2_post_config(crtc); @@ -2790,7 +2973,12 @@ static int vop2_create_crtcs(struct vop2 *vop2) } drm_crtc_helper_add(&vp->crtc, &vop2_crtc_helper_funcs); + if (vop2->lut_regs) { + const struct vop2_video_port_data *vp_data = &vop2_data->vp[vp->id]; + drm_mode_crtc_set_gamma_size(&vp->crtc, vp_data->gamma_lut_len); + drm_crtc_enable_color_mgmt(&vp->crtc, 0, false, vp_data->gamma_lut_len); + } init_completion(&vp->dsp_hold_completion); } diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.h b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.h index 615a16196aff..510dda6f9092 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.h +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.h @@ -394,6 +394,7 @@ enum dst_factor_mode { #define RK3568_REG_CFG_DONE__GLB_CFG_DONE_EN BIT(15) #define RK3568_VP_DSP_CTRL__STANDBY BIT(31) +#define RK3568_VP_DSP_CTRL__DSP_LUT_EN BIT(28) #define RK3568_VP_DSP_CTRL__DITHER_DOWN_MODE BIT(20) #define RK3568_VP_DSP_CTRL__DITHER_DOWN_SEL GENMASK(19, 18) #define RK3568_VP_DSP_CTRL__DITHER_DOWN_EN BIT(17) @@ -408,6 +409,8 @@ enum dst_factor_mode { #define RK3568_VP_DSP_CTRL__CORE_DCLK_DIV BIT(4) #define RK3568_VP_DSP_CTRL__OUT_MODE GENMASK(3, 0) +#define RK3588_VP_DSP_CTRL__GAMMA_UPDATE_EN BIT(22) + #define RK3588_VP_CLK_CTRL__DCLK_OUT_DIV GENMASK(3, 2) #define RK3588_VP_CLK_CTRL__DCLK_CORE_DIV GENMASK(1, 0) @@ -460,6 +463,8 @@ enum dst_factor_mode { #define RK3588_DSP_IF_POL__DP1_PIN_POL GENMASK(14, 12) #define RK3588_DSP_IF_POL__DP0_PIN_POL GENMASK(10, 8) +#define RK3588_LUT_PORT_SEL__GAMMA_AHB_WRITE_SEL GENMASK(13, 12) + #define RK3568_VP0_MIPI_CTRL__DCLK_DIV2_PHASE_LOCK BIT(5) #define RK3568_VP0_MIPI_CTRL__DCLK_DIV2 BIT(4) diff --git a/drivers/gpu/drm/solomon/ssd130x.c b/drivers/gpu/drm/solomon/ssd130x.c index 486d8f5282f9..b777690fd660 100644 --- a/drivers/gpu/drm/solomon/ssd130x.c +++ b/drivers/gpu/drm/solomon/ssd130x.c @@ -18,9 +18,9 @@ #include <linux/pwm.h> #include <linux/regulator/consumer.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_damage_helper.h> #include <drm/drm_edid.h> @@ -39,7 +39,6 @@ #define DRIVER_NAME "ssd130x" #define DRIVER_DESC "DRM driver for Solomon SSD13xx OLED displays" -#define DRIVER_DATE "20220131" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -1784,7 +1783,6 @@ static const struct drm_driver ssd130x_drm_driver = { DRM_FBDEV_SHMEM_DRIVER_OPS, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .driver_features = DRIVER_ATOMIC | DRIVER_GEM | DRIVER_MODESET, diff --git a/drivers/gpu/drm/sprd/sprd_drm.c b/drivers/gpu/drm/sprd/sprd_drm.c index bc1c747d3ea4..ceacdcb7c566 100644 --- a/drivers/gpu/drm/sprd/sprd_drm.c +++ b/drivers/gpu/drm/sprd/sprd_drm.c @@ -23,7 +23,6 @@ #define DRIVER_NAME "sprd" #define DRIVER_DESC "Spreadtrum SoCs' DRM Driver" -#define DRIVER_DATE "20200201" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -59,7 +58,6 @@ static struct drm_driver sprd_drm_drv = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/sti/sti_drv.c b/drivers/gpu/drm/sti/sti_drv.c index 61ceff9aee7e..5e9332df21df 100644 --- a/drivers/gpu/drm/sti/sti_drv.c +++ b/drivers/gpu/drm/sti/sti_drv.c @@ -13,9 +13,9 @@ #include <linux/of_platform.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_debugfs.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> @@ -29,7 +29,6 @@ #define DRIVER_NAME "sti" #define DRIVER_DESC "STMicroelectronics SoC DRM" -#define DRIVER_DATE "20140601" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -143,7 +142,6 @@ static const struct drm_driver sti_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/stm/drv.c b/drivers/gpu/drm/stm/drv.c index bf090a354989..8ebcaf953782 100644 --- a/drivers/gpu/drm/stm/drv.c +++ b/drivers/gpu/drm/stm/drv.c @@ -16,9 +16,9 @@ #include <linux/platform_device.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -62,7 +62,6 @@ static const struct drm_driver drv_driver = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC, .name = "stm", .desc = "STMicroelectronics SoC DRM", - .date = "20170330", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/sun4i/sun4i_drv.c b/drivers/gpu/drm/sun4i/sun4i_drv.c index 5eccf58f2e17..c11dfb2739fa 100644 --- a/drivers/gpu/drm/sun4i/sun4i_drv.c +++ b/drivers/gpu/drm/sun4i/sun4i_drv.c @@ -15,8 +15,8 @@ #include <linux/of_reserved_mem.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_dma_helper.h> @@ -50,7 +50,6 @@ static const struct drm_driver sun4i_drv_driver = { .fops = &sun4i_drv_fops, .name = "sun4i-drm", .desc = "Allwinner sun4i Display Engine", - .date = "20150629", .major = 1, .minor = 0, diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index bf3421667ecc..4596073fe28f 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -13,9 +13,9 @@ #include <linux/platform_device.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_debugfs.h> #include <drm/drm_drv.h> #include <drm/drm_fourcc.h> @@ -35,7 +35,6 @@ #define DRIVER_NAME "tegra" #define DRIVER_DESC "NVIDIA Tegra graphics" -#define DRIVER_DATE "20120330" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 @@ -901,7 +900,6 @@ static const struct drm_driver tegra_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c index 1ad711f8d2a8..cacb5f3d8085 100644 --- a/drivers/gpu/drm/tidss/tidss_dispc.c +++ b/drivers/gpu/drm/tidss/tidss_dispc.c @@ -700,7 +700,7 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) { dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc); - /* clear the irqstatus for newly enabled irqs */ + /* clear the irqstatus for irqs that will be enabled */ dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask); dispc_k2g_vp_set_irqenable(dispc, 0, mask); @@ -708,6 +708,9 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) dispc_write(dispc, DISPC_IRQENABLE_SET, (1 << 0) | (1 << 7)); + /* clear the irqstatus for irqs that were disabled */ + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask); + /* flush posted write */ dispc_k2g_read_irqenable(dispc); } @@ -780,24 +783,18 @@ static void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask) { unsigned int i; - u32 top_clear = 0; for (i = 0; i < dispc->feat->num_vps; ++i) { - if (clearmask & DSS_IRQ_VP_MASK(i)) { + if (clearmask & DSS_IRQ_VP_MASK(i)) dispc_k3_vp_write_irqstatus(dispc, i, clearmask); - top_clear |= BIT(i); - } } for (i = 0; i < dispc->feat->num_planes; ++i) { - if (clearmask & DSS_IRQ_PLANE_MASK(i)) { + if (clearmask & DSS_IRQ_PLANE_MASK(i)) dispc_k3_vid_write_irqstatus(dispc, i, clearmask); - top_clear |= BIT(4 + i); - } } - if (dispc->feat->subrev == DISPC_K2G) - return; - dispc_write(dispc, DISPC_IRQSTATUS, top_clear); + /* always clear the top level irqstatus */ + dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS)); /* Flush posted writes */ dispc_read(dispc, DISPC_IRQSTATUS); @@ -843,7 +840,7 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, old_mask = dispc_k3_read_irqenable(dispc); - /* clear the irqstatus for newly enabled irqs */ + /* clear the irqstatus for irqs that will be enabled */ dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask); for (i = 0; i < dispc->feat->num_vps; ++i) { @@ -868,6 +865,9 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, if (main_disable) dispc_write(dispc, DISPC_IRQENABLE_CLR, main_disable); + /* clear the irqstatus for irqs that were disabled */ + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask); + /* Flush posted writes */ dispc_read(dispc, DISPC_IRQENABLE_SET); } @@ -2767,8 +2767,12 @@ static void dispc_init_errata(struct dispc_device *dispc) */ static void dispc_softreset_k2g(struct dispc_device *dispc) { + unsigned long flags; + + spin_lock_irqsave(&dispc->tidss->irq_lock, flags); dispc_set_irqenable(dispc, 0); dispc_read_and_clear_irqstatus(dispc); + spin_unlock_irqrestore(&dispc->tidss->irq_lock, flags); for (unsigned int vp_idx = 0; vp_idx < dispc->feat->num_vps; ++vp_idx) VP_REG_FLD_MOD(dispc, vp_idx, DISPC_VP_CONTROL, 0, 0, 0); diff --git a/drivers/gpu/drm/tidss/tidss_drv.c b/drivers/gpu/drm/tidss/tidss_drv.c index 7c8fd6407d82..d4652e8cc28c 100644 --- a/drivers/gpu/drm/tidss/tidss_drv.c +++ b/drivers/gpu/drm/tidss/tidss_drv.c @@ -9,9 +9,9 @@ #include <linux/module.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_crtc.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> @@ -113,7 +113,6 @@ static const struct drm_driver tidss_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "tidss", .desc = "TI Keystone DSS", - .date = "20180215", .major = 1, .minor = 0, }; @@ -140,7 +139,7 @@ static int tidss_probe(struct platform_device *pdev) platform_set_drvdata(pdev, tidss); - spin_lock_init(&tidss->wait_lock); + spin_lock_init(&tidss->irq_lock); ret = dispc_init(tidss); if (ret) { diff --git a/drivers/gpu/drm/tidss/tidss_drv.h b/drivers/gpu/drm/tidss/tidss_drv.h index d7f27b0b0315..7f4f4282bc04 100644 --- a/drivers/gpu/drm/tidss/tidss_drv.h +++ b/drivers/gpu/drm/tidss/tidss_drv.h @@ -29,8 +29,9 @@ struct tidss_device { unsigned int irq; - spinlock_t wait_lock; /* protects the irq masks */ - dispc_irq_t irq_mask; /* enabled irqs in addition to wait_list */ + /* protects the irq masks field and irqenable/irqstatus registers */ + spinlock_t irq_lock; + dispc_irq_t irq_mask; /* enabled irqs */ }; #define to_tidss(__dev) container_of(__dev, struct tidss_device, ddev) diff --git a/drivers/gpu/drm/tidss/tidss_irq.c b/drivers/gpu/drm/tidss/tidss_irq.c index 604334ef526a..5abc788781f4 100644 --- a/drivers/gpu/drm/tidss/tidss_irq.c +++ b/drivers/gpu/drm/tidss/tidss_irq.c @@ -15,10 +15,9 @@ #include "tidss_irq.h" #include "tidss_plane.h" -/* call with wait_lock and dispc runtime held */ static void tidss_irq_update(struct tidss_device *tidss) { - assert_spin_locked(&tidss->wait_lock); + assert_spin_locked(&tidss->irq_lock); dispc_set_irqenable(tidss->dispc, tidss->irq_mask); } @@ -31,11 +30,11 @@ void tidss_irq_enable_vblank(struct drm_crtc *crtc) u32 hw_videoport = tcrtc->hw_videoport; unsigned long flags; - spin_lock_irqsave(&tidss->wait_lock, flags); + spin_lock_irqsave(&tidss->irq_lock, flags); tidss->irq_mask |= DSS_IRQ_VP_VSYNC_EVEN(hw_videoport) | DSS_IRQ_VP_VSYNC_ODD(hw_videoport); tidss_irq_update(tidss); - spin_unlock_irqrestore(&tidss->wait_lock, flags); + spin_unlock_irqrestore(&tidss->irq_lock, flags); } void tidss_irq_disable_vblank(struct drm_crtc *crtc) @@ -46,11 +45,11 @@ void tidss_irq_disable_vblank(struct drm_crtc *crtc) u32 hw_videoport = tcrtc->hw_videoport; unsigned long flags; - spin_lock_irqsave(&tidss->wait_lock, flags); + spin_lock_irqsave(&tidss->irq_lock, flags); tidss->irq_mask &= ~(DSS_IRQ_VP_VSYNC_EVEN(hw_videoport) | DSS_IRQ_VP_VSYNC_ODD(hw_videoport)); tidss_irq_update(tidss); - spin_unlock_irqrestore(&tidss->wait_lock, flags); + spin_unlock_irqrestore(&tidss->irq_lock, flags); } static irqreturn_t tidss_irq_handler(int irq, void *arg) @@ -60,7 +59,9 @@ static irqreturn_t tidss_irq_handler(int irq, void *arg) unsigned int id; dispc_irq_t irqstatus; + spin_lock(&tidss->irq_lock); irqstatus = dispc_read_and_clear_irqstatus(tidss->dispc); + spin_unlock(&tidss->irq_lock); for (id = 0; id < tidss->num_crtcs; id++) { struct drm_crtc *crtc = tidss->crtcs[id]; @@ -78,8 +79,13 @@ static irqreturn_t tidss_irq_handler(int irq, void *arg) tidss_crtc_error_irq(crtc, irqstatus); } - if (irqstatus & DSS_IRQ_DEVICE_OCP_ERR) - dev_err_ratelimited(tidss->dev, "OCP error\n"); + for (unsigned int i = 0; i < tidss->num_planes; ++i) { + struct drm_plane *plane = tidss->planes[i]; + struct tidss_plane *tplane = to_tidss_plane(plane); + + if (irqstatus & DSS_IRQ_PLANE_FIFO_UNDERFLOW(tplane->hw_plane_id)) + tidss_plane_error_irq(plane, irqstatus); + } return IRQ_HANDLED; } @@ -88,9 +94,9 @@ void tidss_irq_resume(struct tidss_device *tidss) { unsigned long flags; - spin_lock_irqsave(&tidss->wait_lock, flags); + spin_lock_irqsave(&tidss->irq_lock, flags); tidss_irq_update(tidss); - spin_unlock_irqrestore(&tidss->wait_lock, flags); + spin_unlock_irqrestore(&tidss->irq_lock, flags); } int tidss_irq_install(struct drm_device *ddev, unsigned int irq) @@ -105,7 +111,7 @@ int tidss_irq_install(struct drm_device *ddev, unsigned int irq) if (ret) return ret; - tidss->irq_mask = DSS_IRQ_DEVICE_OCP_ERR; + tidss->irq_mask = 0; for (unsigned int i = 0; i < tidss->num_crtcs; ++i) { struct tidss_crtc *tcrtc = to_tidss_crtc(tidss->crtcs[i]); @@ -115,6 +121,12 @@ int tidss_irq_install(struct drm_device *ddev, unsigned int irq) tidss->irq_mask |= DSS_IRQ_VP_FRAME_DONE(tcrtc->hw_videoport); } + for (unsigned int i = 0; i < tidss->num_planes; ++i) { + struct tidss_plane *tplane = to_tidss_plane(tidss->planes[i]); + + tidss->irq_mask |= DSS_IRQ_PLANE_FIFO_UNDERFLOW(tplane->hw_plane_id); + } + return 0; } diff --git a/drivers/gpu/drm/tidss/tidss_irq.h b/drivers/gpu/drm/tidss/tidss_irq.h index b512614d5863..dd61f645f662 100644 --- a/drivers/gpu/drm/tidss/tidss_irq.h +++ b/drivers/gpu/drm/tidss/tidss_irq.h @@ -19,15 +19,13 @@ * bit use |D |fou|FEOL|FEOL|FEOL|FEOL| UUUU | | * bit number|0 |1-3|4-7 |8-11| 12-19 | 20-23 | 24-31 | * - * device bits: D = OCP error + * device bits: D = Unused * WB bits: f = frame done wb, o = wb buffer overflow, * u = wb buffer uncomplete * vp bits: F = frame done, E = vsync even, O = vsync odd, L = sync lost * plane bits: U = fifo underflow */ -#define DSS_IRQ_DEVICE_OCP_ERR BIT(0) - #define DSS_IRQ_DEVICE_FRAMEDONEWB BIT(1) #define DSS_IRQ_DEVICE_WBBUFFEROVERFLOW BIT(2) #define DSS_IRQ_DEVICE_WBUNCOMPLETEERROR BIT(3) diff --git a/drivers/gpu/drm/tidss/tidss_plane.c b/drivers/gpu/drm/tidss/tidss_plane.c index a5d86822c9e3..116de124bddb 100644 --- a/drivers/gpu/drm/tidss/tidss_plane.c +++ b/drivers/gpu/drm/tidss/tidss_plane.c @@ -18,6 +18,14 @@ #include "tidss_drv.h" #include "tidss_plane.h" +void tidss_plane_error_irq(struct drm_plane *plane, u64 irqstatus) +{ + struct tidss_plane *tplane = to_tidss_plane(plane); + + dev_err_ratelimited(plane->dev->dev, "Plane%u underflow (irq %llx)\n", + tplane->hw_plane_id, irqstatus); +} + /* drm_plane_helper_funcs */ static int tidss_plane_atomic_check(struct drm_plane *plane, diff --git a/drivers/gpu/drm/tidss/tidss_plane.h b/drivers/gpu/drm/tidss/tidss_plane.h index e933e158b617..aecaf2728406 100644 --- a/drivers/gpu/drm/tidss/tidss_plane.h +++ b/drivers/gpu/drm/tidss/tidss_plane.h @@ -22,4 +22,6 @@ struct tidss_plane *tidss_plane_create(struct tidss_device *tidss, u32 crtc_mask, const u32 *formats, u32 num_formats); +void tidss_plane_error_irq(struct drm_plane *plane, u64 irqstatus); + #endif diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c b/drivers/gpu/drm/tilcdc/tilcdc_drv.c index 6f0df8d6b90c..7caec4d38ddf 100644 --- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c +++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c @@ -13,8 +13,8 @@ #include <linux/platform_device.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_debugfs.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> @@ -481,7 +481,6 @@ static const struct drm_driver tilcdc_driver = { .fops = &fops, .name = "tilcdc", .desc = "TI LCD Controller DRM", - .date = "20121205", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile index 4aaf56f8707d..60816d2eb4ff 100644 --- a/drivers/gpu/drm/tiny/Makefile +++ b/drivers/gpu/drm/tiny/Makefile @@ -2,7 +2,7 @@ obj-$(CONFIG_DRM_ARCPGU) += arcpgu.o obj-$(CONFIG_DRM_BOCHS) += bochs.o -obj-$(CONFIG_DRM_CIRRUS_QEMU) += cirrus.o +obj-$(CONFIG_DRM_CIRRUS_QEMU) += cirrus-qemu.o obj-$(CONFIG_DRM_GM12U320) += gm12u320.o obj-$(CONFIG_DRM_OFDRM) += ofdrm.o obj-$(CONFIG_DRM_PANEL_MIPI_DBI) += panel-mipi-dbi.o diff --git a/drivers/gpu/drm/tiny/arcpgu.c b/drivers/gpu/drm/tiny/arcpgu.c index 0cc68042a6d6..2748d1f21d86 100644 --- a/drivers/gpu/drm/tiny/arcpgu.c +++ b/drivers/gpu/drm/tiny/arcpgu.c @@ -6,8 +6,9 @@ */ #include <linux/clk.h> + +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_debugfs.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> @@ -289,7 +290,7 @@ static int arcpgu_load(struct arcpgu_drm_private *arcpgu) * There is only one output port inside each device. It is linked with * encoder endpoint. */ - endpoint_node = of_graph_get_next_endpoint(pdev->dev.of_node, NULL); + endpoint_node = of_graph_get_endpoint_by_regs(pdev->dev.of_node, 0, -1); if (endpoint_node) { encoder_node = of_graph_get_remote_port_parent(endpoint_node); of_node_put(endpoint_node); @@ -366,7 +367,6 @@ static const struct drm_driver arcpgu_drm_driver = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC, .name = "arcpgu", .desc = "ARC PGU Controller", - .date = "20160219", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c index 6f91ff1dbf7e..89a699370a59 100644 --- a/drivers/gpu/drm/tiny/bochs.c +++ b/drivers/gpu/drm/tiny/bochs.c @@ -5,9 +5,9 @@ #include <linux/module.h> #include <linux/pci.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_edid.h> @@ -680,7 +680,6 @@ static const struct drm_driver bochs_driver = { .fops = &bochs_fops, .name = "bochs-drm", .desc = "bochs dispi vga interface (qemu stdvga)", - .date = "20130925", .major = 1, .minor = 0, DRM_GEM_SHMEM_DRIVER_OPS, diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus-qemu.c index 4d2adcaeaa60..52ec1e4ea9e5 100644 --- a/drivers/gpu/drm/tiny/cirrus.c +++ b/drivers/gpu/drm/tiny/cirrus-qemu.c @@ -24,10 +24,10 @@ #include <video/cirrus.h> #include <video/vga.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_atomic_state_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> @@ -46,9 +46,8 @@ #include <drm/drm_module.h> #include <drm/drm_probe_helper.h> -#define DRIVER_NAME "cirrus" +#define DRIVER_NAME "cirrus-qemu" #define DRIVER_DESC "qemu cirrus vga" -#define DRIVER_DATE "2019" #define DRIVER_MAJOR 2 #define DRIVER_MINOR 0 @@ -589,14 +588,14 @@ static int cirrus_pipe_init(struct cirrus_device *cirrus) encoder = &cirrus->encoder; ret = drm_encoder_init(dev, encoder, &cirrus_encoder_funcs, - DRM_MODE_ENCODER_DAC, NULL); + DRM_MODE_ENCODER_VIRTUAL, NULL); if (ret) return ret; encoder->possible_crtcs = drm_crtc_mask(crtc); connector = &cirrus->connector; ret = drm_connector_init(dev, connector, &cirrus_connector_funcs, - DRM_MODE_CONNECTOR_VGA); + DRM_MODE_CONNECTOR_VIRTUAL); if (ret) return ret; drm_connector_helper_add(connector, &cirrus_connector_helper_funcs); @@ -659,7 +658,6 @@ static const struct drm_driver cirrus_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, diff --git a/drivers/gpu/drm/tiny/gm12u320.c b/drivers/gpu/drm/tiny/gm12u320.c index 0c17ae532fb4..41e9bfb2e2ff 100644 --- a/drivers/gpu/drm/tiny/gm12u320.c +++ b/drivers/gpu/drm/tiny/gm12u320.c @@ -7,9 +7,9 @@ #include <linux/pm.h> #include <linux/usb.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_atomic_state_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> @@ -34,7 +34,6 @@ MODULE_PARM_DESC(eco_mode, "Turn on Eco mode (less bright, more silent)"); #define DRIVER_NAME "gm12u320" #define DRIVER_DESC "Grain Media GM12U320 USB projector display" -#define DRIVER_DATE "2019" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -626,7 +625,6 @@ static const struct drm_driver gm12u320_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, diff --git a/drivers/gpu/drm/tiny/hx8357d.c b/drivers/gpu/drm/tiny/hx8357d.c index 6b0d1846cfcf..df263818f45f 100644 --- a/drivers/gpu/drm/tiny/hx8357d.c +++ b/drivers/gpu/drm/tiny/hx8357d.c @@ -16,8 +16,8 @@ #include <linux/property.h> #include <linux/spi/spi.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -199,7 +199,6 @@ static const struct drm_driver hx8357d_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "hx8357d", .desc = "HX8357D", - .date = "20181023", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/ili9163.c b/drivers/gpu/drm/tiny/ili9163.c index 5eb39ca1a855..62cadf5e033d 100644 --- a/drivers/gpu/drm/tiny/ili9163.c +++ b/drivers/gpu/drm/tiny/ili9163.c @@ -7,8 +7,8 @@ #include <linux/property.h> #include <linux/spi/spi.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -118,7 +118,6 @@ static struct drm_driver ili9163_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "ili9163", .desc = "Ilitek ILI9163", - .date = "20210208", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/ili9225.c b/drivers/gpu/drm/tiny/ili9225.c index 875e2d09729a..6de44ff69b51 100644 --- a/drivers/gpu/drm/tiny/ili9225.c +++ b/drivers/gpu/drm/tiny/ili9225.c @@ -16,8 +16,8 @@ #include <linux/spi/spi.h> #include <video/mipi_display.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_dma_helper.h> @@ -364,7 +364,6 @@ static const struct drm_driver ili9225_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "ili9225", .desc = "Ilitek ILI9225", - .date = "20171106", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/ili9341.c b/drivers/gpu/drm/tiny/ili9341.c index c1dfdfbbd30c..e55029433509 100644 --- a/drivers/gpu/drm/tiny/ili9341.c +++ b/drivers/gpu/drm/tiny/ili9341.c @@ -15,8 +15,8 @@ #include <linux/property.h> #include <linux/spi/spi.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -155,7 +155,6 @@ static const struct drm_driver ili9341_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "ili9341", .desc = "Ilitek ILI9341", - .date = "20180514", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/ili9486.c b/drivers/gpu/drm/tiny/ili9486.c index 7e46a720d5e2..093661c771a0 100644 --- a/drivers/gpu/drm/tiny/ili9486.c +++ b/drivers/gpu/drm/tiny/ili9486.c @@ -14,8 +14,8 @@ #include <video/mipi_display.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -177,7 +177,6 @@ static const struct drm_driver ili9486_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "ili9486", .desc = "Ilitek ILI9486", - .date = "20200118", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/mi0283qt.c b/drivers/gpu/drm/tiny/mi0283qt.c index f1461c55dba6..b6b4664908ae 100644 --- a/drivers/gpu/drm/tiny/mi0283qt.c +++ b/drivers/gpu/drm/tiny/mi0283qt.c @@ -13,8 +13,8 @@ #include <linux/regulator/consumer.h> #include <linux/spi/spi.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -159,7 +159,6 @@ static const struct drm_driver mi0283qt_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "mi0283qt", .desc = "Multi-Inno MI0283QT", - .date = "20160614", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c index 9898eab5e9e2..13491c0e704a 100644 --- a/drivers/gpu/drm/tiny/ofdrm.c +++ b/drivers/gpu/drm/tiny/ofdrm.c @@ -5,9 +5,9 @@ #include <linux/pci.h> #include <linux/platform_device.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_state_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_damage_helper.h> #include <drm/drm_device.h> @@ -25,7 +25,6 @@ #define DRIVER_NAME "ofdrm" #define DRIVER_DESC "DRM driver for OF platform devices" -#define DRIVER_DATE "20220501" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -1348,7 +1347,6 @@ static struct drm_driver ofdrm_driver = { DRM_FBDEV_SHMEM_DRIVER_OPS, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .driver_features = DRIVER_ATOMIC | DRIVER_GEM | DRIVER_MODESET, diff --git a/drivers/gpu/drm/tiny/panel-mipi-dbi.c b/drivers/gpu/drm/tiny/panel-mipi-dbi.c index e66729b31bd6..4786b8144a9f 100644 --- a/drivers/gpu/drm/tiny/panel-mipi-dbi.c +++ b/drivers/gpu/drm/tiny/panel-mipi-dbi.c @@ -14,8 +14,8 @@ #include <linux/regulator/consumer.h> #include <linux/spi/spi.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -269,7 +269,6 @@ static const struct drm_driver panel_mipi_dbi_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "panel-mipi-dbi", .desc = "MIPI DBI compatible display panel", - .date = "20220103", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/repaper.c b/drivers/gpu/drm/tiny/repaper.c index 77944eb17b3c..52ba6c699bc8 100644 --- a/drivers/gpu/drm/tiny/repaper.c +++ b/drivers/gpu/drm/tiny/repaper.c @@ -21,8 +21,8 @@ #include <linux/spi/spi.h> #include <linux/thermal.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> @@ -917,7 +917,6 @@ static const struct drm_driver repaper_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "repaper", .desc = "Pervasive Displays RePaper e-ink panels", - .date = "20170405", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/sharp-memory.c b/drivers/gpu/drm/tiny/sharp-memory.c index 2d2315bd6aef..03d2850310c4 100644 --- a/drivers/gpu/drm/tiny/sharp-memory.c +++ b/drivers/gpu/drm/tiny/sharp-memory.c @@ -1,8 +1,8 @@ // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> @@ -107,7 +107,6 @@ static const struct drm_driver sharp_memory_drm_driver = { DRM_FBDEV_DMA_DRIVER_OPS, .name = "sharp_memory_display", .desc = "Sharp Display Memory LCD", - .date = "20231129", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c index 4d4f05dee244..5d9ab8adf800 100644 --- a/drivers/gpu/drm/tiny/simpledrm.c +++ b/drivers/gpu/drm/tiny/simpledrm.c @@ -10,9 +10,9 @@ #include <linux/pm_domain.h> #include <linux/regulator/consumer.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_state_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_damage_helper.h> @@ -31,7 +31,6 @@ #define DRIVER_NAME "simpledrm" #define DRIVER_DESC "DRM driver for simple-framebuffer platform devices" -#define DRIVER_DATE "20200625" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -1015,7 +1014,6 @@ static struct drm_driver simpledrm_driver = { DRM_FBDEV_SHMEM_DRIVER_OPS, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .driver_features = DRIVER_ATOMIC | DRIVER_GEM | DRIVER_MODESET, diff --git a/drivers/gpu/drm/tiny/st7586.c b/drivers/gpu/drm/tiny/st7586.c index 97013685c62f..a29672d84ede 100644 --- a/drivers/gpu/drm/tiny/st7586.c +++ b/drivers/gpu/drm/tiny/st7586.c @@ -12,8 +12,8 @@ #include <linux/spi/spi.h> #include <video/mipi_display.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_dma_helper.h> @@ -295,7 +295,6 @@ static const struct drm_driver st7586_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "st7586", .desc = "Sitronix ST7586", - .date = "20170801", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/tiny/st7735r.c b/drivers/gpu/drm/tiny/st7735r.c index 0747ebd999cc..1d60f6e5b3bc 100644 --- a/drivers/gpu/drm/tiny/st7735r.c +++ b/drivers/gpu/drm/tiny/st7735r.c @@ -16,8 +16,8 @@ #include <linux/spi/spi.h> #include <video/mipi_display.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_gem_atomic_helper.h> @@ -160,7 +160,6 @@ static const struct drm_driver st7735r_driver = { .debugfs_init = mipi_dbi_debugfs_init, .name = "st7735r", .desc = "Sitronix ST7735R", - .date = "20171128", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 2c699ed1963a..a194db83421d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -58,13 +58,13 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) return VM_FAULT_RETRY; - ttm_bo_get(bo); + drm_gem_object_get(&bo->base); mmap_read_unlock(vmf->vma->vm_mm); (void)dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_KERNEL, true, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(bo->base.resv); - ttm_bo_put(bo); + drm_gem_object_put(&bo->base); return VM_FAULT_RETRY; } @@ -130,12 +130,12 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, */ if (fault_flag_allow_retry_first(vmf->flags)) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { - ttm_bo_get(bo); + drm_gem_object_get(&bo->base); mmap_read_unlock(vmf->vma->vm_mm); if (!dma_resv_lock_interruptible(bo->base.resv, NULL)) dma_resv_unlock(bo->base.resv); - ttm_bo_put(bo); + drm_gem_object_put(&bo->base); } return VM_FAULT_RETRY; @@ -353,7 +353,7 @@ void ttm_bo_vm_open(struct vm_area_struct *vma) WARN_ON(bo->bdev->dev_mapping != vma->vm_file->f_mapping); - ttm_bo_get(bo); + drm_gem_object_get(&bo->base); } EXPORT_SYMBOL(ttm_bo_vm_open); @@ -361,7 +361,7 @@ void ttm_bo_vm_close(struct vm_area_struct *vma) { struct ttm_buffer_object *bo = vma->vm_private_data; - ttm_bo_put(bo); + drm_gem_object_put(&bo->base); vma->vm_private_data = NULL; } EXPORT_SYMBOL(ttm_bo_vm_close); @@ -405,13 +405,25 @@ static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo, return len; } -int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, - void *buf, int len, int write) +/** + * ttm_bo_access - Helper to access a buffer object + * + * @bo: ttm buffer object + * @offset: access offset into buffer object + * @buf: pointer to caller memory to read into or write from + * @len: length of access + * @write: write access + * + * Utility function to access a buffer object. Useful when buffer object cannot + * be easily mapped (non-contiguous, non-visible, etc...). Should not directly + * be exported to user space via a peak / poke interface. + * + * Returns: + * @len if successful, negative error code on failure. + */ +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset, + void *buf, int len, int write) { - struct ttm_buffer_object *bo = vma->vm_private_data; - unsigned long offset = (addr) - vma->vm_start + - ((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node)) - << PAGE_SHIFT); int ret; if (len < 1 || (offset + len) > bo->base.size) @@ -429,8 +441,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, break; default: if (bo->bdev->funcs->access_memory) - ret = bo->bdev->funcs->access_memory( - bo, offset, buf, len, write); + ret = bo->bdev->funcs->access_memory + (bo, offset, buf, len, write); else ret = -EIO; } @@ -439,6 +451,18 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, return ret; } +EXPORT_SYMBOL(ttm_bo_access); + +int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, + void *buf, int len, int write) +{ + struct ttm_buffer_object *bo = vma->vm_private_data; + unsigned long offset = (addr) - vma->vm_start + + ((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node)) + << PAGE_SHIFT); + + return ttm_bo_access(bo, offset, buf, len, write); +} EXPORT_SYMBOL(ttm_bo_vm_access); static const struct vm_operations_struct ttm_bo_vm_ops = { @@ -462,7 +486,7 @@ int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo) if (is_cow_mapping(vma->vm_flags)) return -EINVAL; - ttm_bo_get(bo); + drm_gem_object_get(&bo->base); /* * Drivers may want to override the vm_ops field. Otherwise we diff --git a/drivers/gpu/drm/tve200/tve200_drv.c b/drivers/gpu/drm/tve200/tve200_drv.c index c341aee37dd9..a048e37f1c2c 100644 --- a/drivers/gpu/drm/tve200/tve200_drv.c +++ b/drivers/gpu/drm/tve200/tve200_drv.c @@ -37,9 +37,9 @@ #include <linux/shmem_fs.h> #include <linux/slab.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -146,7 +146,6 @@ static const struct drm_driver tve200_drm_driver = { .fops = &drm_fops, .name = "tve200", .desc = DRIVER_DESC, - .date = "20170703", .major = 1, .minor = 0, .patchlevel = 0, diff --git a/drivers/gpu/drm/udl/udl_drv.c b/drivers/gpu/drm/udl/udl_drv.c index 8d8ae40f945c..05b3a152cc33 100644 --- a/drivers/gpu/drm/udl/udl_drv.c +++ b/drivers/gpu/drm/udl/udl_drv.c @@ -5,8 +5,8 @@ #include <linux/module.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> -#include <drm/drm_client_setup.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_file.h> #include <drm/drm_gem_shmem_helper.h> @@ -78,7 +78,6 @@ static const struct drm_driver driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/udl/udl_drv.h b/drivers/gpu/drm/udl/udl_drv.h index 1eb716d9dad5..be00dc1d87a1 100644 --- a/drivers/gpu/drm/udl/udl_drv.h +++ b/drivers/gpu/drm/udl/udl_drv.h @@ -26,7 +26,6 @@ struct drm_mode_create_dumb; #define DRIVER_NAME "udl" #define DRIVER_DESC "DisplayLink" -#define DRIVER_DATE "20120220" #define DRIVER_MAJOR 0 #define DRIVER_MINOR 0 diff --git a/drivers/gpu/drm/v3d/v3d_debugfs.c b/drivers/gpu/drm/v3d/v3d_debugfs.c index 19e3ee7ac897..76816f2551c1 100644 --- a/drivers/gpu/drm/v3d/v3d_debugfs.c +++ b/drivers/gpu/drm/v3d/v3d_debugfs.c @@ -237,8 +237,8 @@ static int v3d_measure_clock(struct seq_file *m, void *unused) if (v3d->ver >= 40) { int cycle_count_reg = V3D_PCTR_CYCLE_COUNT(v3d->ver); V3D_CORE_WRITE(core, V3D_V4_PCTR_0_SRC_0_3, - V3D_SET_FIELD(cycle_count_reg, - V3D_PCTR_S0)); + V3D_SET_FIELD_VER(cycle_count_reg, + V3D_PCTR_S0, v3d->ver)); V3D_CORE_WRITE(core, V3D_V4_PCTR_0_CLR, 1); V3D_CORE_WRITE(core, V3D_V4_PCTR_0_EN, 1); } else { diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index bee51c942a56..930737a9347b 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -31,7 +31,6 @@ #define DRIVER_NAME "v3d" #define DRIVER_DESC "Broadcom V3D graphics" -#define DRIVER_DATE "20180419" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 @@ -224,6 +223,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = { DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH), DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_COUNTER, v3d_perfmon_get_counter_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(V3D_PERFMON_SET_GLOBAL, v3d_perfmon_set_global_ioctl, DRM_RENDER_ALLOW), }; static const struct drm_driver v3d_drm_driver = { @@ -248,7 +248,6 @@ static const struct drm_driver v3d_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index de73eefff9ac..dc1cfe2e14be 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -183,6 +183,12 @@ struct v3d_dev { u32 num_allocated; u32 pages_allocated; } bo_stats; + + /* To support a performance analysis tool in user space, we require + * a single, globally configured performance monitor (perfmon) for + * all jobs. + */ + struct v3d_perfmon *global_perfmon; }; static inline struct v3d_dev * @@ -594,6 +600,8 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); +int v3d_perfmon_set_global_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); /* v3d_sysfs.c */ int v3d_sysfs_init(struct device *dev); diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c index 924814cab46a..3ebda2fa46fc 100644 --- a/drivers/gpu/drm/v3d/v3d_perfmon.c +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c @@ -240,17 +240,18 @@ void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon) for (i = 0; i < ncounters; i++) { u32 source = i / 4; - u32 channel = V3D_SET_FIELD(perfmon->counters[i], V3D_PCTR_S0); + u32 channel = V3D_SET_FIELD_VER(perfmon->counters[i], V3D_PCTR_S0, + v3d->ver); i++; - channel |= V3D_SET_FIELD(i < ncounters ? perfmon->counters[i] : 0, - V3D_PCTR_S1); + channel |= V3D_SET_FIELD_VER(i < ncounters ? perfmon->counters[i] : 0, + V3D_PCTR_S1, v3d->ver); i++; - channel |= V3D_SET_FIELD(i < ncounters ? perfmon->counters[i] : 0, - V3D_PCTR_S2); + channel |= V3D_SET_FIELD_VER(i < ncounters ? perfmon->counters[i] : 0, + V3D_PCTR_S2, v3d->ver); i++; - channel |= V3D_SET_FIELD(i < ncounters ? perfmon->counters[i] : 0, - V3D_PCTR_S3); + channel |= V3D_SET_FIELD_VER(i < ncounters ? perfmon->counters[i] : 0, + V3D_PCTR_S3, v3d->ver); V3D_CORE_WRITE(0, V3D_V4_PCTR_0_SRC_X(source), channel); } @@ -312,6 +313,9 @@ static int v3d_perfmon_idr_del(int id, void *elem, void *data) if (perfmon == v3d->active_perfmon) v3d_perfmon_stop(v3d, perfmon, false); + /* If the global perfmon is being destroyed, set it to NULL */ + cmpxchg(&v3d->global_perfmon, perfmon, NULL); + v3d_perfmon_put(perfmon); return 0; @@ -383,6 +387,7 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data, { struct v3d_file_priv *v3d_priv = file_priv->driver_priv; struct drm_v3d_perfmon_destroy *req = data; + struct v3d_dev *v3d = v3d_priv->v3d; struct v3d_perfmon *perfmon; mutex_lock(&v3d_priv->perfmon.lock); @@ -392,6 +397,13 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data, if (!perfmon) return -EINVAL; + /* If the active perfmon is being destroyed, stop it first */ + if (perfmon == v3d->active_perfmon) + v3d_perfmon_stop(v3d, perfmon, false); + + /* If the global perfmon is being destroyed, set it to NULL */ + cmpxchg(&v3d->global_perfmon, perfmon, NULL); + v3d_perfmon_put(perfmon); return 0; @@ -451,3 +463,34 @@ int v3d_perfmon_get_counter_ioctl(struct drm_device *dev, void *data, return 0; } + +int v3d_perfmon_set_global_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct v3d_file_priv *v3d_priv = file_priv->driver_priv; + struct drm_v3d_perfmon_set_global *req = data; + struct v3d_dev *v3d = to_v3d_dev(dev); + struct v3d_perfmon *perfmon; + + if (req->flags & ~DRM_V3D_PERFMON_CLEAR_GLOBAL) + return -EINVAL; + + perfmon = v3d_perfmon_find(v3d_priv, req->id); + if (!perfmon) + return -EINVAL; + + /* If the request is to clear the global performance monitor */ + if (req->flags & DRM_V3D_PERFMON_CLEAR_GLOBAL) { + if (!v3d->global_perfmon) + return -EINVAL; + + xchg(&v3d->global_perfmon, NULL); + + return 0; + } + + if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) + return -EBUSY; + + return 0; +} diff --git a/drivers/gpu/drm/v3d/v3d_regs.h b/drivers/gpu/drm/v3d/v3d_regs.h index 1b1a62ad9585..6da3c69082bd 100644 --- a/drivers/gpu/drm/v3d/v3d_regs.h +++ b/drivers/gpu/drm/v3d/v3d_regs.h @@ -15,6 +15,14 @@ fieldval & field##_MASK; \ }) +#define V3D_SET_FIELD_VER(value, field, ver) \ + ({ \ + typeof(ver) _ver = (ver); \ + u32 fieldval = (value) << field##_SHIFT(_ver); \ + WARN_ON((fieldval & ~field##_MASK(_ver)) != 0); \ + fieldval & field##_MASK(_ver); \ + }) + #define V3D_GET_FIELD(word, field) (((word) & field##_MASK) >> \ field##_SHIFT) @@ -354,18 +362,15 @@ #define V3D_V4_PCTR_0_SRC_28_31 0x0067c #define V3D_V4_PCTR_0_SRC_X(x) (V3D_V4_PCTR_0_SRC_0_3 + \ 4 * (x)) -# define V3D_PCTR_S0_MASK V3D_MASK(6, 0) -# define V3D_V7_PCTR_S0_MASK V3D_MASK(7, 0) -# define V3D_PCTR_S0_SHIFT 0 -# define V3D_PCTR_S1_MASK V3D_MASK(14, 8) -# define V3D_V7_PCTR_S1_MASK V3D_MASK(15, 8) -# define V3D_PCTR_S1_SHIFT 8 -# define V3D_PCTR_S2_MASK V3D_MASK(22, 16) -# define V3D_V7_PCTR_S2_MASK V3D_MASK(23, 16) -# define V3D_PCTR_S2_SHIFT 16 -# define V3D_PCTR_S3_MASK V3D_MASK(30, 24) -# define V3D_V7_PCTR_S3_MASK V3D_MASK(31, 24) -# define V3D_PCTR_S3_SHIFT 24 +# define V3D_PCTR_S0_MASK(ver) (((ver) >= 71) ? V3D_MASK(7, 0) : V3D_MASK(6, 0)) +# define V3D_PCTR_S0_SHIFT(ver) 0 +# define V3D_PCTR_S1_MASK(ver) (((ver) >= 71) ? V3D_MASK(15, 8) : V3D_MASK(14, 8)) +# define V3D_PCTR_S1_SHIFT(ver) 8 +# define V3D_PCTR_S2_MASK(ver) (((ver) >= 71) ? V3D_MASK(23, 16) : V3D_MASK(22, 16)) +# define V3D_PCTR_S2_SHIFT(ver) 16 +# define V3D_PCTR_S3_MASK(ver) (((ver) >= 71) ? V3D_MASK(31, 24) : V3D_MASK(30, 24)) +# define V3D_PCTR_S3_SHIFT(ver) 24 + #define V3D_PCTR_CYCLE_COUNT(ver) ((ver >= 71) ? 0 : 32) /* Output values of the counters. */ diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 99ac4995b5a1..a6c3760da6ed 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -120,11 +120,19 @@ v3d_cpu_job_free(struct drm_sched_job *sched_job) static void v3d_switch_perfmon(struct v3d_dev *v3d, struct v3d_job *job) { - if (job->perfmon != v3d->active_perfmon) + struct v3d_perfmon *perfmon = v3d->global_perfmon; + + if (!perfmon) + perfmon = job->perfmon; + + if (perfmon == v3d->active_perfmon) + return; + + if (perfmon != v3d->active_perfmon) v3d_perfmon_stop(v3d, v3d->active_perfmon, true); - if (job->perfmon && v3d->active_perfmon != job->perfmon) - v3d_perfmon_start(v3d, job->perfmon); + if (perfmon && v3d->active_perfmon != perfmon) + v3d_perfmon_start(v3d, perfmon); } static void diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c index d607aa9c4ec2..9e439c9f0a93 100644 --- a/drivers/gpu/drm/v3d/v3d_submit.c +++ b/drivers/gpu/drm/v3d/v3d_submit.c @@ -981,6 +981,11 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data, goto fail; if (args->perfmon_id) { + if (v3d->global_perfmon) { + ret = -EAGAIN; + goto fail_perfmon; + } + render->base.perfmon = v3d_perfmon_find(v3d_priv, args->perfmon_id); @@ -1196,6 +1201,11 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data, goto fail; if (args->perfmon_id) { + if (v3d->global_perfmon) { + ret = -EAGAIN; + goto fail_perfmon; + } + job->base.perfmon = v3d_perfmon_find(v3d_priv, args->perfmon_id); if (!job->base.perfmon) { diff --git a/drivers/gpu/drm/vboxvideo/vbox_drv.c b/drivers/gpu/drm/vboxvideo/vbox_drv.c index a536c467e2b2..bb861f0a0a31 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_drv.c +++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c @@ -13,8 +13,8 @@ #include <linux/pci.h> #include <linux/vt_kern.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_file.h> @@ -189,7 +189,6 @@ static const struct drm_driver driver = { .fops = &vbox_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/vboxvideo/vbox_drv.h b/drivers/gpu/drm/vboxvideo/vbox_drv.h index e77bd6512eb1..dfa935f381a6 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_drv.h +++ b/drivers/gpu/drm/vboxvideo/vbox_drv.h @@ -25,7 +25,6 @@ #define DRIVER_NAME "vboxvideo" #define DRIVER_DESC "Oracle VM VirtualBox Graphics Card" -#define DRIVER_DATE "20130823" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 diff --git a/drivers/gpu/drm/vc4/tests/vc4_mock.c b/drivers/gpu/drm/vc4/tests/vc4_mock.c index 6527fb1db71e..e276a957b01c 100644 --- a/drivers/gpu/drm/vc4/tests/vc4_mock.c +++ b/drivers/gpu/drm/vc4/tests/vc4_mock.c @@ -51,8 +51,8 @@ struct vc4_mock_desc { static const struct vc4_mock_desc vc4_mock = VC4_MOCK_DESC( - VC4_MOCK_CRTC_DESC(&vc4_txp_crtc_data, - VC4_MOCK_OUTPUT_DESC(VC4_ENCODER_TYPE_TXP, + VC4_MOCK_CRTC_DESC(&bcm2835_txp_data.base, + VC4_MOCK_OUTPUT_DESC(VC4_ENCODER_TYPE_TXP0, DRM_MODE_ENCODER_VIRTUAL, DRM_MODE_CONNECTOR_WRITEBACK)), VC4_MOCK_PIXELVALVE_DESC(&bcm2835_pv0_data, @@ -77,8 +77,8 @@ static const struct vc4_mock_desc vc4_mock = static const struct vc4_mock_desc vc5_mock = VC4_MOCK_DESC( - VC4_MOCK_CRTC_DESC(&vc4_txp_crtc_data, - VC4_MOCK_OUTPUT_DESC(VC4_ENCODER_TYPE_TXP, + VC4_MOCK_CRTC_DESC(&bcm2835_txp_data.base, + VC4_MOCK_OUTPUT_DESC(VC4_ENCODER_TYPE_TXP0, DRM_MODE_ENCODER_VIRTUAL, DRM_MODE_CONNECTOR_WRITEBACK)), VC4_MOCK_PIXELVALVE_DESC(&bcm2711_pv0_data, diff --git a/drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c b/drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c index 61622e951031..40a05869a50e 100644 --- a/drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c +++ b/drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c @@ -90,7 +90,7 @@ static const struct encoder_constraint vc4_encoder_constraints[] = { ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_DSI0, 0), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_HDMI0, 1), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_VEC, 1), - ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_TXP, 2), + ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_TXP0, 2), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_DSI1, 2), }; @@ -98,7 +98,7 @@ static const struct encoder_constraint vc5_encoder_constraints[] = { ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_DPI, 0), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_DSI0, 0), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_VEC, 1), - ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_TXP, 0, 2), + ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_TXP0, 0, 2), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_DSI1, 0, 1, 2), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_HDMI0, 0, 1, 2), ENCODER_CONSTRAINT(VC4_ENCODER_TYPE_HDMI1, 0, 1, 2), @@ -207,7 +207,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_PV_MUXING_TEST("1 output: DSI1", VC4_ENCODER_TYPE_DSI1), VC4_PV_MUXING_TEST("1 output: TXP", - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("2 outputs: DSI0, HDMI0", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_HDMI0), @@ -219,7 +219,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_DSI1), VC4_PV_MUXING_TEST("2 outputs: DSI0, TXP", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("2 outputs: DPI, HDMI0", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_HDMI0), @@ -231,19 +231,19 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_DSI1), VC4_PV_MUXING_TEST("2 outputs: DPI, TXP", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("2 outputs: HDMI0, DSI1", VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_DSI1), VC4_PV_MUXING_TEST("2 outputs: HDMI0, TXP", VC4_ENCODER_TYPE_HDMI0, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("2 outputs: VEC, DSI1", VC4_ENCODER_TYPE_VEC, VC4_ENCODER_TYPE_DSI1), VC4_PV_MUXING_TEST("2 outputs: VEC, TXP", VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("3 outputs: DSI0, HDMI0, DSI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_HDMI0, @@ -251,7 +251,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_PV_MUXING_TEST("3 outputs: DSI0, HDMI0, TXP", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_HDMI0, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("3 outputs: DSI0, VEC, DSI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, @@ -259,7 +259,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_PV_MUXING_TEST("3 outputs: DSI0, VEC, TXP", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("3 outputs: DPI, HDMI0, DSI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_HDMI0, @@ -267,7 +267,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_PV_MUXING_TEST("3 outputs: DPI, HDMI0, TXP", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_HDMI0, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("3 outputs: DPI, VEC, DSI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, @@ -275,7 +275,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_params[] = { VC4_PV_MUXING_TEST("3 outputs: DPI, VEC, TXP", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), }; KUNIT_ARRAY_PARAM(vc4_test_pv_muxing, @@ -287,7 +287,7 @@ static const struct pv_muxing_param vc4_test_pv_muxing_invalid_params[] = { VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_DSI0), VC4_PV_MUXING_TEST("TXP/DSI1 Conflict", - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1), VC4_PV_MUXING_TEST("HDMI0/VEC Conflict", VC4_ENCODER_TYPE_HDMI0, @@ -296,22 +296,22 @@ static const struct pv_muxing_param vc4_test_pv_muxing_invalid_params[] = { VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_DSI1, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, DSI1, TXP", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, VC4_ENCODER_TYPE_DSI1, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("More than 3 outputs: DPI, HDMI0, DSI1, TXP", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_DSI1, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC4_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, DSI1, TXP", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, VC4_ENCODER_TYPE_DSI1, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), }; KUNIT_ARRAY_PARAM(vc4_test_pv_muxing_invalid, @@ -342,7 +342,7 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("2 outputs: DPI, TXP", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("2 outputs: DPI, VEC", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC), @@ -360,7 +360,7 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("2 outputs: DSI0, TXP", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("2 outputs: DSI0, VEC", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC), @@ -372,7 +372,7 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_VEC), VC5_PV_MUXING_TEST("2 outputs: DSI1, TXP", VC4_ENCODER_TYPE_DSI1, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("2 outputs: DSI1, HDMI0", VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0), @@ -384,7 +384,7 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_VEC), VC5_PV_MUXING_TEST("2 outputs: HDMI0, TXP", VC4_ENCODER_TYPE_HDMI0, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("2 outputs: HDMI0, HDMI1", VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), @@ -393,14 +393,14 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_VEC), VC5_PV_MUXING_TEST("2 outputs: HDMI1, TXP", VC4_ENCODER_TYPE_HDMI1, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("2 outputs: TXP, VEC", - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_VEC), VC5_PV_MUXING_TEST("3 outputs: DPI, VEC, TXP", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("3 outputs: DPI, VEC, DSI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, @@ -415,15 +415,15 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("3 outputs: DPI, TXP, DSI1", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1), VC5_PV_MUXING_TEST("3 outputs: DPI, TXP, HDMI0", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("3 outputs: DPI, TXP, HDMI1", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("3 outputs: DPI, DSI1, HDMI0", VC4_ENCODER_TYPE_DPI, @@ -440,7 +440,7 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC5_PV_MUXING_TEST("3 outputs: DSI0, VEC, TXP", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP), + VC4_ENCODER_TYPE_TXP0), VC5_PV_MUXING_TEST("3 outputs: DSI0, VEC, DSI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, @@ -455,15 +455,15 @@ static const struct pv_muxing_param vc5_test_pv_muxing_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("3 outputs: DSI0, TXP, DSI1", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1), VC5_PV_MUXING_TEST("3 outputs: DSI0, TXP, HDMI0", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("3 outputs: DSI0, TXP, HDMI1", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("3 outputs: DSI0, DSI1, HDMI0", VC4_ENCODER_TYPE_DSI0, @@ -490,17 +490,17 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, DSI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, HDMI0", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, HDMI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, DSI1, HDMI0", VC4_ENCODER_TYPE_DPI, @@ -519,17 +519,17 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, TXP, DSI1, HDMI0", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, TXP, DSI1, HDMI1", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, TXP, HDMI0, HDMI1", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, DSI1, HDMI0, HDMI1", @@ -540,19 +540,19 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, DSI1, HDMI0", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, DSI1, HDMI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, HDMI0, HDMI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, DSI1, HDMI0, HDMI1", @@ -563,24 +563,24 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, TXP, DSI1, HDMI0, HDMI1", VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, DSI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, HDMI0", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, HDMI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, DSI1, HDMI0", VC4_ENCODER_TYPE_DSI0, @@ -599,17 +599,17 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, TXP, DSI1, HDMI0", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, TXP, DSI1, HDMI1", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, TXP, HDMI0, HDMI1", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, DSI1, HDMI0, HDMI1", @@ -620,19 +620,19 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, DSI1, HDMI0", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, DSI1, HDMI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, HDMI0, HDMI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, DSI1, HDMI0, HDMI1", @@ -643,27 +643,27 @@ static const struct pv_muxing_param vc5_test_pv_muxing_invalid_params[] = { VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, TXP, DSI1, HDMI0, HDMI1", VC4_ENCODER_TYPE_DSI0, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: VEC, TXP, DSI1, HDMI0, HDMI1", VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DPI, VEC, TXP, DSI1, HDMI0, HDMI1", VC4_ENCODER_TYPE_DPI, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), VC5_PV_MUXING_TEST("More than 3 outputs: DSI0, VEC, TXP, DSI1, HDMI0, HDMI1", VC4_ENCODER_TYPE_DSI0, VC4_ENCODER_TYPE_VEC, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_HDMI0, VC4_ENCODER_TYPE_HDMI1), diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index ee82a959d279..cf40a53ad42e 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -83,13 +83,22 @@ static unsigned int vc4_crtc_get_cob_allocation(struct vc4_dev *vc4, unsigned int channel) { struct vc4_hvs *hvs = vc4->hvs; - u32 dispbase = HVS_READ(SCALER_DISPBASEX(channel)); + u32 dispbase, top, base; + /* Top/base are supposed to be 4-pixel aligned, but the * Raspberry Pi firmware fills the low bits (which are * presumably ignored). */ - u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3; - u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3; + + if (vc4->gen >= VC4_GEN_6_C) { + dispbase = HVS_READ(SCALER6_DISPX_COB(channel)); + top = VC4_GET_FIELD(dispbase, SCALER6_DISPX_COB_TOP) & ~3; + base = VC4_GET_FIELD(dispbase, SCALER6_DISPX_COB_BASE) & ~3; + } else { + dispbase = HVS_READ(SCALER_DISPBASEX(channel)); + top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3; + base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3; + } return top - base + 4; } @@ -122,7 +131,10 @@ static bool vc4_crtc_get_scanout_position(struct drm_crtc *crtc, * Read vertical scanline which is currently composed for our * pixelvalve by the HVS, and also the scaler status. */ - val = HVS_READ(SCALER_DISPSTATX(channel)); + if (vc4->gen >= VC4_GEN_6_C) + val = HVS_READ(SCALER6_DISPX_STATUS(channel)); + else + val = HVS_READ(SCALER_DISPSTATX(channel)); /* Get optional system timestamp after query. */ if (etime) @@ -131,7 +143,12 @@ static bool vc4_crtc_get_scanout_position(struct drm_crtc *crtc, /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */ /* Vertical position of hvs composed scanline. */ - *vpos = VC4_GET_FIELD(val, SCALER_DISPSTATX_LINE); + + if (vc4->gen >= VC4_GEN_6_C) + *vpos = VC4_GET_FIELD(val, SCALER6_DISPX_STATUS_YLINE); + else + *vpos = VC4_GET_FIELD(val, SCALER_DISPSTATX_LINE); + *hpos = 0; if (mode->flags & DRM_MODE_FLAG_INTERLACE) { @@ -223,6 +240,11 @@ static u32 vc4_get_fifo_full_level(struct vc4_crtc *vc4_crtc, u32 format) const struct vc4_crtc_data *crtc_data = vc4_crtc_to_vc4_crtc_data(vc4_crtc); const struct vc4_pv_data *pv_data = vc4_crtc_to_vc4_pv_data(vc4_crtc); struct vc4_dev *vc4 = to_vc4_dev(vc4_crtc->base.dev); + + /* + * NOTE: Could we use register 0x68 (PV_HW_CFG1) to get the FIFO + * size? + */ u32 fifo_len_bytes = pv_data->fifo_depth; /* @@ -404,6 +426,7 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode */ CRTC_WRITE(PV_V_CONTROL, PV_VCONTROL_CONTINUOUS | + (vc4->gen >= VC4_GEN_6_C ? PV_VCONTROL_ODD_TIMING : 0) | (is_dsi ? PV_VCONTROL_DSI : 0) | PV_VCONTROL_INTERLACE | (odd_field_first @@ -415,6 +438,7 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode } else { CRTC_WRITE(PV_V_CONTROL, PV_VCONTROL_CONTINUOUS | + (vc4->gen >= VC4_GEN_6_C ? PV_VCONTROL_ODD_TIMING : 0) | (is_dsi ? PV_VCONTROL_DSI : 0)); CRTC_WRITE(PV_VSYNCD_EVEN, 0); } @@ -429,11 +453,17 @@ static void vc4_crtc_config_pv(struct drm_crtc *crtc, struct drm_encoder *encode if (is_dsi) CRTC_WRITE(PV_HACT_ACT, mode->hdisplay * pixel_rep); - if (vc4->gen == VC4_GEN_5) + if (vc4->gen >= VC4_GEN_5) CRTC_WRITE(PV_MUX_CFG, VC4_SET_FIELD(PV_MUX_CFG_RGB_PIXEL_MUX_MODE_NO_SWAP, PV_MUX_CFG_RGB_PIXEL_MUX_MODE)); + if (vc4->gen >= VC4_GEN_6_C) + CRTC_WRITE(PV_PIPE_INIT_CTRL, + VC4_SET_FIELD(1, PV_PIPE_INIT_CTRL_PV_INIT_WIDTH) | + VC4_SET_FIELD(1, PV_PIPE_INIT_CTRL_PV_INIT_IDLE) | + PV_PIPE_INIT_CTRL_PV_INIT_EN); + CRTC_WRITE(PV_CONTROL, PV_CONTROL_FIFO_CLR | vc4_crtc_get_fifo_full_level_bits(vc4_crtc, format) | VC4_SET_FIELD(format, PV_CONTROL_FORMAT) | @@ -459,8 +489,10 @@ static void require_hvs_enabled(struct drm_device *dev) struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_hvs *hvs = vc4->hvs; - WARN_ON_ONCE((HVS_READ(SCALER_DISPCTRL) & SCALER_DISPCTRL_ENABLE) != - SCALER_DISPCTRL_ENABLE); + if (vc4->gen >= VC4_GEN_6_C) + WARN_ON_ONCE(!(HVS_READ(SCALER6_CONTROL) & SCALER6_CONTROL_HVS_EN)); + else + WARN_ON_ONCE(!(HVS_READ(SCALER_DISPCTRL) & SCALER_DISPCTRL_ENABLE)); } static int vc4_crtc_disable(struct drm_crtc *crtc, @@ -530,7 +562,11 @@ int vc4_crtc_disable_at_boot(struct drm_crtc *crtc) if (!(of_device_is_compatible(vc4_crtc->pdev->dev.of_node, "brcm,bcm2711-pixelvalve2") || of_device_is_compatible(vc4_crtc->pdev->dev.of_node, - "brcm,bcm2711-pixelvalve4"))) + "brcm,bcm2711-pixelvalve4") || + of_device_is_compatible(vc4_crtc->pdev->dev.of_node, + "brcm,bcm2712-pixelvalve0") || + of_device_is_compatible(vc4_crtc->pdev->dev.of_node, + "brcm,bcm2712-pixelvalve1"))) return 0; if (!(CRTC_READ(PV_CONTROL) & PV_CONTROL_EN)) @@ -789,14 +825,21 @@ static void vc4_crtc_handle_page_flip(struct vc4_crtc *vc4_crtc) struct drm_device *dev = crtc->dev; struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_hvs *hvs = vc4->hvs; + unsigned int current_dlist; u32 chan = vc4_crtc->current_hvs_channel; unsigned long flags; spin_lock_irqsave(&dev->event_lock, flags); spin_lock(&vc4_crtc->irq_lock); + + if (vc4->gen >= VC4_GEN_6_C) + current_dlist = VC4_GET_FIELD(HVS_READ(SCALER6_DISPX_DL(chan)), + SCALER6_DISPX_DL_LACT); + else + current_dlist = HVS_READ(SCALER_DISPLACTX(chan)); + if (vc4_crtc->event && - (vc4_crtc->current_dlist == HVS_READ(SCALER_DISPLACTX(chan)) || - vc4_crtc->feeds_txp)) { + (vc4_crtc->current_dlist == current_dlist || vc4_crtc->feeds_txp)) { drm_crtc_send_vblank_event(crtc, vc4_crtc->event); vc4_crtc->event = NULL; drm_crtc_vblank_put(crtc); @@ -807,7 +850,8 @@ static void vc4_crtc_handle_page_flip(struct vc4_crtc *vc4_crtc) * the CRTC and encoder already reconfigured, leading to * underruns. This can be seen when reconfiguring the CRTC. */ - vc4_hvs_unmask_underrun(hvs, chan); + if (vc4->gen < VC4_GEN_6_C) + vc4_hvs_unmask_underrun(hvs, chan); } spin_unlock(&vc4_crtc->irq_lock); spin_unlock_irqrestore(&dev->event_lock, flags); @@ -1265,6 +1309,32 @@ const struct vc4_pv_data bcm2711_pv4_data = { }, }; +const struct vc4_pv_data bcm2712_pv0_data = { + .base = { + .debugfs_name = "crtc0_regs", + .hvs_available_channels = BIT(0), + .hvs_output = 0, + }, + .fifo_depth = 64, + .pixels_per_clock = 1, + .encoder_types = { + [0] = VC4_ENCODER_TYPE_HDMI0, + }, +}; + +const struct vc4_pv_data bcm2712_pv1_data = { + .base = { + .debugfs_name = "crtc1_regs", + .hvs_available_channels = BIT(1), + .hvs_output = 1, + }, + .fifo_depth = 64, + .pixels_per_clock = 1, + .encoder_types = { + [0] = VC4_ENCODER_TYPE_HDMI1, + }, +}; + static const struct of_device_id vc4_crtc_dt_match[] = { { .compatible = "brcm,bcm2835-pixelvalve0", .data = &bcm2835_pv0_data }, { .compatible = "brcm,bcm2835-pixelvalve1", .data = &bcm2835_pv1_data }, @@ -1274,6 +1344,8 @@ static const struct of_device_id vc4_crtc_dt_match[] = { { .compatible = "brcm,bcm2711-pixelvalve2", .data = &bcm2711_pv2_data }, { .compatible = "brcm,bcm2711-pixelvalve3", .data = &bcm2711_pv3_data }, { .compatible = "brcm,bcm2711-pixelvalve4", .data = &bcm2711_pv4_data }, + { .compatible = "brcm,bcm2712-pixelvalve0", .data = &bcm2712_pv0_data }, + { .compatible = "brcm,bcm2712-pixelvalve1", .data = &bcm2712_pv1_data }, {} }; diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c index 2c60d37275b0..c7cb1e3a6434 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.c +++ b/drivers/gpu/drm/vc4/vc4_drv.c @@ -31,8 +31,8 @@ #include <linux/platform_device.h> #include <linux/pm_runtime.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_dma.h> #include <drm/drm_fourcc.h> @@ -47,7 +47,6 @@ #define DRIVER_NAME "vc4" #define DRIVER_DESC "Broadcom VC4 graphics" -#define DRIVER_DATE "20140616" #define DRIVER_MAJOR 0 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 @@ -222,7 +221,6 @@ const struct drm_driver vc4_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, @@ -244,7 +242,6 @@ const struct drm_driver vc5_drm_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, @@ -279,6 +276,7 @@ static void vc4_component_unbind_all(void *ptr) static const struct of_device_id vc4_dma_range_matches[] = { { .compatible = "brcm,bcm2711-hvs" }, + { .compatible = "brcm,bcm2712-hvs" }, { .compatible = "brcm,bcm2835-hvs" }, { .compatible = "brcm,bcm2835-v3d" }, { .compatible = "brcm,cygnus-v3d" }, @@ -300,16 +298,18 @@ static int vc4_drm_bind(struct device *dev) dev->coherent_dma_mask = DMA_BIT_MASK(32); - if (of_device_is_compatible(dev->of_node, "brcm,bcm2711-vc5")) - gen = VC4_GEN_5; - else - gen = VC4_GEN_4; + gen = (enum vc4_gen)of_device_get_match_data(dev); if (gen > VC4_GEN_4) driver = &vc5_drm_driver; else driver = &vc4_drm_driver; + if (gen >= VC4_GEN_6_C) + dma_set_mask_and_coherent(dev, DMA_BIT_MASK(36)); + else + dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)); + node = of_find_matching_node_and_match(NULL, vc4_dma_range_matches, NULL); if (node) { @@ -462,9 +462,11 @@ static void vc4_platform_drm_shutdown(struct platform_device *pdev) } static const struct of_device_id vc4_of_match[] = { - { .compatible = "brcm,bcm2711-vc5", }, - { .compatible = "brcm,bcm2835-vc4", }, - { .compatible = "brcm,cygnus-vc4", }, + { .compatible = "brcm,bcm2711-vc5", .data = (void *)VC4_GEN_5 }, + /* NB GEN_6_C will be corrected on D0 hw to GEN_6_D via vc4_hvs_bind */ + { .compatible = "brcm,bcm2712-vc6", .data = (void *)VC4_GEN_6_C }, + { .compatible = "brcm,bcm2835-vc4", .data = (void *)VC4_GEN_4 }, + { .compatible = "brcm,cygnus-vc4", .data = (void *)VC4_GEN_4 }, {}, }; MODULE_DEVICE_TABLE(of, vc4_of_match); diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h index c6be1997f1c7..4a078ffd9f82 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.h +++ b/drivers/gpu/drm/vc4/vc4_drv.h @@ -84,6 +84,8 @@ struct vc4_perfmon { enum vc4_gen { VC4_GEN_4, VC4_GEN_5, + VC4_GEN_6_C, + VC4_GEN_6_D, }; struct vc4_dev { @@ -316,6 +318,21 @@ struct vc4_v3d { struct debugfs_regset32 regset; }; +#define VC4_NUM_UPM_HANDLES 32 +struct vc4_upm_refcounts { + refcount_t refcount; + + /* Allocation size */ + size_t size; + /* Our allocation in UPM for prefetching. */ + struct drm_mm_node upm; + + /* Pointer back to the HVS structure */ + struct vc4_hvs *hvs; +}; + +#define HVS_NUM_CHANNELS 3 + struct vc4_hvs { struct vc4_dev *vc4; struct platform_device *pdev; @@ -324,6 +341,7 @@ struct vc4_hvs { unsigned int dlist_mem_size; struct clk *core_clk; + struct clk *disp_clk; unsigned long max_core_rate; @@ -331,8 +349,15 @@ struct vc4_hvs { * list. Units are dwords. */ struct drm_mm dlist_mm; + /* Memory manager for the LBM memory used by HVS scaling. */ struct drm_mm lbm_mm; + + /* Memory manager for the UPM memory used for prefetching. */ + struct drm_mm upm_mm; + struct ida upm_handles; + struct vc4_upm_refcounts upm_refcounts[VC4_NUM_UPM_HANDLES + 1]; + spinlock_t mm_lock; struct drm_mm_node mitchell_netravali_filter; @@ -355,6 +380,7 @@ struct vc4_hvs { }; #define HVS_NUM_CHANNELS 3 +#define HVS_UBM_WORD_SIZE 256 struct vc4_hvs_state { struct drm_private_state base; @@ -424,6 +450,12 @@ struct vc4_plane_state { /* Our allocation in LBM for temporary storage during scaling. */ struct drm_mm_node lbm; + /* The Unified Pre-Fetcher Handle */ + unsigned int upm_handle[DRM_FORMAT_MAX_PLANES]; + + /* Number of lines to pre-fetch */ + unsigned int upm_buffer_lines; + /* Set when the plane has per-pixel alpha content or does not cover * the entire screen. This is a hint to the CRTC that it might need * to enable background color fill. @@ -458,7 +490,8 @@ enum vc4_encoder_type { VC4_ENCODER_TYPE_DSI1, VC4_ENCODER_TYPE_SMI, VC4_ENCODER_TYPE_DPI, - VC4_ENCODER_TYPE_TXP, + VC4_ENCODER_TYPE_TXP0, + VC4_ENCODER_TYPE_TXP1, }; struct vc4_encoder { @@ -505,7 +538,16 @@ struct vc4_crtc_data { int hvs_output; }; -extern const struct vc4_crtc_data vc4_txp_crtc_data; +struct vc4_txp_data { + struct vc4_crtc_data base; + enum vc4_encoder_type encoder_type; + unsigned int high_addr_ptr_reg; + unsigned int has_byte_enable:1; + unsigned int size_minus_one:1; + unsigned int supports_40bit_addresses:1; +}; + +extern const struct vc4_txp_data bcm2835_txp_data; struct vc4_pv_data { struct vc4_crtc_data base; @@ -527,6 +569,8 @@ extern const struct vc4_pv_data bcm2711_pv1_data; extern const struct vc4_pv_data bcm2711_pv2_data; extern const struct vc4_pv_data bcm2711_pv3_data; extern const struct vc4_pv_data bcm2711_pv4_data; +extern const struct vc4_pv_data bcm2712_pv0_data; +extern const struct vc4_pv_data bcm2712_pv1_data; struct vc4_crtc { struct drm_crtc base; @@ -637,6 +681,12 @@ struct vc4_crtc_state { writel(val, hvs->regs + (offset)); \ } while (0) +#define HVS_READ6(offset) \ + HVS_READ(hvs->vc4->gen == VC4_GEN_6_C ? SCALER6_ ## offset : SCALER6D_ ## offset) + +#define HVS_WRITE6(offset, val) \ + HVS_WRITE(hvs->vc4->gen == VC4_GEN_6_C ? SCALER6_ ## offset : SCALER6D_ ## offset, val) + #define VC4_REG32(reg) { .name = #reg, .offset = reg } struct vc4_exec_info { diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index e3818c48c9b8..d6ebcb9cd999 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -845,6 +845,7 @@ static void vc4_hdmi_encoder_post_crtc_disable(struct drm_encoder *encoder, { struct vc4_hdmi *vc4_hdmi = encoder_to_vc4_hdmi(encoder); struct drm_device *drm = vc4_hdmi->connector.dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); unsigned long flags; int idx; @@ -861,14 +862,25 @@ static void vc4_hdmi_encoder_post_crtc_disable(struct drm_encoder *encoder, HDMI_WRITE(HDMI_VID_CTL, HDMI_READ(HDMI_VID_CTL) | VC4_HD_VID_CTL_CLRRGB); + if (vc4->gen >= VC4_GEN_6_C) + HDMI_WRITE(HDMI_VID_CTL, HDMI_READ(HDMI_VID_CTL) | + VC4_HD_VID_CTL_BLANKPIX); + spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags); mdelay(1); - spin_lock_irqsave(&vc4_hdmi->hw_lock, flags); - HDMI_WRITE(HDMI_VID_CTL, - HDMI_READ(HDMI_VID_CTL) & ~VC4_HD_VID_CTL_ENABLE); - spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags); + /* + * TODO: This should work on BCM2712, but doesn't for some + * reason and result in a system lockup. + */ + if (vc4->gen < VC4_GEN_6_C) { + spin_lock_irqsave(&vc4_hdmi->hw_lock, flags); + HDMI_WRITE(HDMI_VID_CTL, + HDMI_READ(HDMI_VID_CTL) & + ~VC4_HD_VID_CTL_ENABLE); + spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags); + } vc4_hdmi_disable_scrambling(encoder); @@ -1488,7 +1500,6 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder, goto err_put_runtime_pm; } - vc4_hdmi_cec_update_clk_div(vc4_hdmi); if (tmds_char_rate > 297000000) @@ -1594,6 +1605,8 @@ static void vc4_hdmi_encoder_post_crtc_enable(struct drm_encoder *encoder, spin_lock_irqsave(&vc4_hdmi->hw_lock, flags); HDMI_WRITE(HDMI_VID_CTL, + (HDMI_READ(HDMI_VID_CTL) & + ~(VC4_HD_VID_CTL_VSYNC_LOW | VC4_HD_VID_CTL_HSYNC_LOW)) | VC4_HD_VID_CTL_ENABLE | VC4_HD_VID_CTL_CLRRGB | VC4_HD_VID_CTL_UNDERFLOW_ENABLE | @@ -2110,18 +2123,33 @@ static int vc4_hdmi_audio_prepare(struct device *dev, void *data, VC4_HDMI_AUDIO_PACKET_CEA_MASK); /* Set the MAI threshold */ - if (vc4->gen >= VC4_GEN_5) + switch (vc4->gen) { + case VC4_GEN_6_D: + HDMI_WRITE(HDMI_MAI_THR, + VC4_SET_FIELD(0x10, VC6_D_HD_MAI_THR_PANICHIGH) | + VC4_SET_FIELD(0x10, VC6_D_HD_MAI_THR_PANICLOW) | + VC4_SET_FIELD(0x1c, VC6_D_HD_MAI_THR_DREQHIGH) | + VC4_SET_FIELD(0x1c, VC6_D_HD_MAI_THR_DREQLOW)); + break; + case VC4_GEN_6_C: + case VC4_GEN_5: HDMI_WRITE(HDMI_MAI_THR, VC4_SET_FIELD(0x10, VC4_HD_MAI_THR_PANICHIGH) | VC4_SET_FIELD(0x10, VC4_HD_MAI_THR_PANICLOW) | VC4_SET_FIELD(0x1c, VC4_HD_MAI_THR_DREQHIGH) | VC4_SET_FIELD(0x1c, VC4_HD_MAI_THR_DREQLOW)); - else + break; + case VC4_GEN_4: HDMI_WRITE(HDMI_MAI_THR, VC4_SET_FIELD(0x8, VC4_HD_MAI_THR_PANICHIGH) | VC4_SET_FIELD(0x8, VC4_HD_MAI_THR_PANICLOW) | VC4_SET_FIELD(0x6, VC4_HD_MAI_THR_DREQHIGH) | VC4_SET_FIELD(0x8, VC4_HD_MAI_THR_DREQLOW)); + break; + default: + drm_err(drm, "Unknown VC4 generation: %d", vc4->gen); + break; + } HDMI_WRITE(HDMI_MAI_CONFIG, VC4_HDMI_MAI_CONFIG_BIT_REVERSE | @@ -3121,6 +3149,7 @@ static int vc4_hdmi_runtime_suspend(struct device *dev) { struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev); + clk_disable_unprepare(vc4_hdmi->audio_clock); clk_disable_unprepare(vc4_hdmi->hsm_clock); return 0; @@ -3153,6 +3182,10 @@ static int vc4_hdmi_runtime_resume(struct device *dev) goto err_disable_clk; } + ret = clk_prepare_enable(vc4_hdmi->audio_clock); + if (ret) + goto err_disable_clk; + if (vc4_hdmi->variant->reset) vc4_hdmi->variant->reset(vc4_hdmi); @@ -3273,7 +3306,9 @@ static int vc4_hdmi_bind(struct device *dev, struct device *master, void *data) return ret; if ((of_device_is_compatible(dev->of_node, "brcm,bcm2711-hdmi0") || - of_device_is_compatible(dev->of_node, "brcm,bcm2711-hdmi1")) && + of_device_is_compatible(dev->of_node, "brcm,bcm2711-hdmi1") || + of_device_is_compatible(dev->of_node, "brcm,bcm2712-hdmi0") || + of_device_is_compatible(dev->of_node, "brcm,bcm2712-hdmi1")) && HDMI_READ(HDMI_VID_CTL) & VC4_HD_VID_CTL_ENABLE) { clk_prepare_enable(vc4_hdmi->pixel_clock); clk_prepare_enable(vc4_hdmi->hsm_clock); @@ -3407,10 +3442,66 @@ static const struct vc4_hdmi_variant bcm2711_hdmi1_variant = { .hp_detect = vc5_hdmi_hp_detect, }; +static const struct vc4_hdmi_variant bcm2712_hdmi0_variant = { + .encoder_type = VC4_ENCODER_TYPE_HDMI0, + .debugfs_name = "hdmi0_regs", + .card_name = "vc4-hdmi-0", + .max_pixel_clock = 600000000, + .registers = vc6_hdmi_hdmi0_fields, + .num_registers = ARRAY_SIZE(vc6_hdmi_hdmi0_fields), + .phy_lane_mapping = { + PHY_LANE_0, + PHY_LANE_1, + PHY_LANE_2, + PHY_LANE_CK, + }, + .unsupported_odd_h_timings = false, + .external_irq_controller = true, + + .init_resources = vc5_hdmi_init_resources, + .csc_setup = vc5_hdmi_csc_setup, + .reset = vc5_hdmi_reset, + .set_timings = vc5_hdmi_set_timings, + .phy_init = vc6_hdmi_phy_init, + .phy_disable = vc6_hdmi_phy_disable, + .channel_map = vc5_hdmi_channel_map, + .supports_hdr = true, + .hp_detect = vc5_hdmi_hp_detect, +}; + +static const struct vc4_hdmi_variant bcm2712_hdmi1_variant = { + .encoder_type = VC4_ENCODER_TYPE_HDMI1, + .debugfs_name = "hdmi1_regs", + .card_name = "vc4-hdmi-1", + .max_pixel_clock = 600000000, + .registers = vc6_hdmi_hdmi1_fields, + .num_registers = ARRAY_SIZE(vc6_hdmi_hdmi1_fields), + .phy_lane_mapping = { + PHY_LANE_0, + PHY_LANE_1, + PHY_LANE_2, + PHY_LANE_CK, + }, + .unsupported_odd_h_timings = false, + .external_irq_controller = true, + + .init_resources = vc5_hdmi_init_resources, + .csc_setup = vc5_hdmi_csc_setup, + .reset = vc5_hdmi_reset, + .set_timings = vc5_hdmi_set_timings, + .phy_init = vc6_hdmi_phy_init, + .phy_disable = vc6_hdmi_phy_disable, + .channel_map = vc5_hdmi_channel_map, + .supports_hdr = true, + .hp_detect = vc5_hdmi_hp_detect, +}; + static const struct of_device_id vc4_hdmi_dt_match[] = { { .compatible = "brcm,bcm2835-hdmi", .data = &bcm2835_variant }, { .compatible = "brcm,bcm2711-hdmi0", .data = &bcm2711_hdmi0_variant }, { .compatible = "brcm,bcm2711-hdmi1", .data = &bcm2711_hdmi1_variant }, + { .compatible = "brcm,bcm2712-hdmi0", .data = &bcm2712_hdmi0_variant }, + { .compatible = "brcm,bcm2712-hdmi1", .data = &bcm2712_hdmi1_variant }, {} }; diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h index b37f1d2c3fe5..b2424a21da23 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.h +++ b/drivers/gpu/drm/vc4/vc4_hdmi.h @@ -237,4 +237,8 @@ void vc5_hdmi_phy_disable(struct vc4_hdmi *vc4_hdmi); void vc5_hdmi_phy_rng_enable(struct vc4_hdmi *vc4_hdmi); void vc5_hdmi_phy_rng_disable(struct vc4_hdmi *vc4_hdmi); +void vc6_hdmi_phy_init(struct vc4_hdmi *vc4_hdmi, + struct drm_connector_state *conn_state); +void vc6_hdmi_phy_disable(struct vc4_hdmi *vc4_hdmi); + #endif /* _VC4_HDMI_H_ */ diff --git a/drivers/gpu/drm/vc4/vc4_hdmi_phy.c b/drivers/gpu/drm/vc4/vc4_hdmi_phy.c index 1f5507fc7a03..56e6a35da357 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi_phy.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi_phy.c @@ -125,6 +125,48 @@ #define VC4_HDMI_RM_FORMAT_SHIFT_SHIFT 24 #define VC4_HDMI_RM_FORMAT_SHIFT_MASK VC4_MASK(25, 24) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_BG_PWRUP BIT(8) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_LDO_PWRUP BIT(7) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_BIAS_PWRUP BIT(6) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_RNDGEN_PWRUP BIT(4) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_CK_PWRUP BIT(3) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_2_PWRUP BIT(2) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_1_PWRUP BIT(1) +#define VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_0_PWRUP BIT(0) + +#define VC6_HDMI_TX_PHY_PLL_REFCLK_REFCLK_SEL_CMOS BIT(13) +#define VC6_HDMI_TX_PHY_PLL_REFCLK_REFFRQ_MASK VC4_MASK(9, 0) + +#define VC6_HDMI_TX_PHY_PLL_POST_KDIV_CLK0_SEL_MASK VC4_MASK(3, 2) +#define VC6_HDMI_TX_PHY_PLL_POST_KDIV_KDIV_MASK VC4_MASK(1, 0) + +#define VC6_HDMI_TX_PHY_PLL_VCOCLK_DIV_VCODIV_EN BIT(10) +#define VC6_HDMI_TX_PHY_PLL_VCOCLK_DIV_VCODIV_MASK VC4_MASK(9, 0) + +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_CTL_MASK VC4_MASK(31, 28) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_ENABLE_MASK VC4_MASK(27, 27) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_RATE_CTL_MASK VC4_MASK(26, 26) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_POST_TAP_EN_MASK VC4_MASK(25, 25) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_LDMOS_BIAS_CTL_MASK VC4_MASK(24, 23) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_COM_MODE_LDMOS_EN_MASK VC4_MASK(22, 22) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EDGE_SEL_MASK VC4_MASK(21, 21) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_HS_EN_MASK VC4_MASK(20, 20) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_TERM_CTL_MASK VC4_MASK(19, 18) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_EN_MASK VC4_MASK(17, 17) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_EN_MASK VC4_MASK(16, 16) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_CTL_MASK VC4_MASK(15, 12) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_HS_EN_MASK VC4_MASK(11, 11) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_MAIN_TAP_CURRENT_SELECT_MASK VC4_MASK(10, 8) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_POST_TAP_CURRENT_SELECT_MASK VC4_MASK(7, 5) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_LOADING_MASK VC4_MASK(4, 3) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_DRIVING_MASK VC4_MASK(2, 1) +#define VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_PRE_TAP_EN_MASK VC4_MASK(0, 0) + +#define VC6_HDMI_TX_PHY_PLL_RESET_CTL_PLL_PLLPOST_RESETB BIT(1) +#define VC6_HDMI_TX_PHY_PLL_RESET_CTL_PLL_RESETB BIT(0) + +#define VC6_HDMI_TX_PHY_PLL_POWERUP_CTL_PLL_PWRUP BIT(0) + #define OSCILLATOR_FREQUENCY 54000000 void vc4_hdmi_phy_init(struct vc4_hdmi *vc4_hdmi, @@ -558,3 +600,601 @@ void vc5_hdmi_phy_rng_disable(struct vc4_hdmi *vc4_hdmi) VC4_HDMI_TX_PHY_POWERDOWN_CTL_RNDGEN_PWRDN); spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags); } + +#define VC6_VCO_MIN_FREQ (8ULL * 1000 * 1000 * 1000) +#define VC6_VCO_MAX_FREQ (12ULL * 1000 * 1000 * 1000) + +static unsigned long long +vc6_phy_get_vco_freq(unsigned long long tmds_rate, unsigned int *vco_div) +{ + unsigned int min_div; + unsigned int max_div; + unsigned int div; + + div = 0; + while (tmds_rate * div * 10 < VC6_VCO_MIN_FREQ) + div++; + min_div = div; + + while (tmds_rate * (div + 1) * 10 < VC6_VCO_MAX_FREQ) + div++; + max_div = div; + + div = min_div + (max_div - min_div) / 2; + + *vco_div = div; + return tmds_rate * div * 10; +} + +struct vc6_phy_lane_settings { + unsigned int ext_current_ctl:4; + unsigned int ffe_enable:1; + unsigned int slew_rate_ctl:1; + unsigned int ffe_post_tap_en:1; + unsigned int ldmos_bias_ctl:2; + unsigned int com_mode_ldmos_en:1; + unsigned int edge_sel:1; + unsigned int ext_current_src_hs_en:1; + unsigned int term_ctl:2; + unsigned int ext_current_src_en:1; + unsigned int int_current_src_en:1; + unsigned int int_current_ctl:4; + unsigned int int_current_src_hs_en:1; + unsigned int main_tap_current_select:3; + unsigned int post_tap_current_select:3; + unsigned int slew_ctl_slow_loading:2; + unsigned int slew_ctl_slow_driving:2; + unsigned int ffe_pre_tap_en:1; +}; + +struct vc6_phy_settings { + unsigned long long min_rate; + unsigned long long max_rate; + struct vc6_phy_lane_settings channel[3]; + struct vc6_phy_lane_settings clock; +}; + +static const struct vc6_phy_settings vc6_hdmi_phy_settings[] = { + { + 0, 222000000, + { + { + /* 200mA */ + .ext_current_ctl = 8, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + { + /* 200mA */ + .ext_current_ctl = 8, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + { + /* 200mA */ + .ext_current_ctl = 8, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + }, + { + /* 200mA */ + .ext_current_ctl = 8, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + }, + { + 222000001, 297000000, + { + { + /* 200mA and 180mA ?! */ + .ext_current_ctl = 12, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 100 Ohm */ + .term_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + }, + { + /* 200mA and 180mA ?! */ + .ext_current_ctl = 12, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 100 Ohm */ + .term_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + }, + { + /* 200mA and 180mA ?! */ + .ext_current_ctl = 12, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 100 Ohm */ + .term_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + }, + }, + { + /* 200mA and 180mA ?! */ + .ext_current_ctl = 12, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 100 Ohm */ + .term_ctl = 1, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + + /* Internal Current Source Half Swing Enable*/ + .int_current_src_hs_en = 1, + }, + }, + { + 297000001, 597000044, + { + { + /* 200mA */ + .ext_current_ctl = 8, + + /* Normal Slew Rate Control */ + .slew_rate_ctl = 1, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 50 Ohms */ + .term_ctl = 3, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + { + /* 200mA */ + .ext_current_ctl = 8, + + /* Normal Slew Rate Control */ + .slew_rate_ctl = 1, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 50 Ohms */ + .term_ctl = 3, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + { + /* 200mA */ + .ext_current_ctl = 8, + + /* Normal Slew Rate Control */ + .slew_rate_ctl = 1, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* 50 Ohms */ + .term_ctl = 3, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + }, + { + /* 200mA */ + .ext_current_ctl = 8, + + /* Normal Slew Rate Control */ + .slew_rate_ctl = 1, + + /* 0.85V */ + .ldmos_bias_ctl = 1, + + /* External Current Source Half Swing Enable*/ + .ext_current_src_hs_en = 1, + + /* 50 Ohms */ + .term_ctl = 3, + + /* Enable External Current Source */ + .ext_current_src_en = 1, + + /* Enable Internal Current Source */ + .int_current_src_en = 1, + + /* 200mA */ + .int_current_ctl = 8, + + /* Internal Current Source Half Swing Enable*/ + .int_current_src_hs_en = 1, + + /* 17.6 mA */ + .main_tap_current_select = 7, + }, + }, +}; + +static const struct vc6_phy_settings * +vc6_phy_get_settings(unsigned long long tmds_rate) +{ + unsigned int count = ARRAY_SIZE(vc6_hdmi_phy_settings); + unsigned int i; + + for (i = 0; i < count; i++) { + const struct vc6_phy_settings *s = &vc6_hdmi_phy_settings[i]; + + if (tmds_rate >= s->min_rate && tmds_rate <= s->max_rate) + return s; + } + + /* + * If the pixel clock exceeds our max setting, try the max + * setting anyway. + */ + return &vc6_hdmi_phy_settings[count - 1]; +} + +static const struct vc6_phy_lane_settings * +vc6_phy_get_channel_settings(enum vc4_hdmi_phy_channel chan, + unsigned long long tmds_rate) +{ + const struct vc6_phy_settings *settings = vc6_phy_get_settings(tmds_rate); + + if (chan == PHY_LANE_CK) + return &settings->clock; + + return &settings->channel[chan]; +} + +static void vc6_hdmi_reset_phy(struct vc4_hdmi *vc4_hdmi) +{ + lockdep_assert_held(&vc4_hdmi->hw_lock); + + HDMI_WRITE(HDMI_TX_PHY_RESET_CTL, 0); + HDMI_WRITE(HDMI_TX_PHY_POWERUP_CTL, 0); +} + +void vc6_hdmi_phy_init(struct vc4_hdmi *vc4_hdmi, + struct drm_connector_state *conn_state) +{ + const struct vc6_phy_lane_settings *chan0_settings; + const struct vc6_phy_lane_settings *chan1_settings; + const struct vc6_phy_lane_settings *chan2_settings; + const struct vc6_phy_lane_settings *clock_settings; + const struct vc4_hdmi_variant *variant = vc4_hdmi->variant; + unsigned long long pixel_freq = conn_state->hdmi.tmds_char_rate; + unsigned long long vco_freq; + unsigned char word_sel; + unsigned long flags; + unsigned int vco_div; + + vco_freq = vc6_phy_get_vco_freq(pixel_freq, &vco_div); + + spin_lock_irqsave(&vc4_hdmi->hw_lock, flags); + + vc6_hdmi_reset_phy(vc4_hdmi); + + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_0, 0x810c6000); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_1, 0x00b8c451); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_2, 0x46402e31); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_3, 0x00b8c005); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_4, 0x42410261); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_5, 0xcc021001); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_6, 0xc8301c80); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_7, 0xb0804444); + HDMI_WRITE(HDMI_TX_PHY_PLL_MISC_8, 0xf80f8000); + + HDMI_WRITE(HDMI_TX_PHY_PLL_REFCLK, + VC6_HDMI_TX_PHY_PLL_REFCLK_REFCLK_SEL_CMOS | + VC4_SET_FIELD(54, VC6_HDMI_TX_PHY_PLL_REFCLK_REFFRQ)); + + HDMI_WRITE(HDMI_TX_PHY_RESET_CTL, 0x7f); + + HDMI_WRITE(HDMI_RM_OFFSET, + VC4_HDMI_RM_OFFSET_ONLY | + VC4_SET_FIELD(phy_get_rm_offset(vco_freq), + VC4_HDMI_RM_OFFSET_OFFSET)); + + HDMI_WRITE(HDMI_TX_PHY_PLL_VCOCLK_DIV, + VC6_HDMI_TX_PHY_PLL_VCOCLK_DIV_VCODIV_EN | + VC4_SET_FIELD(vco_div, + VC6_HDMI_TX_PHY_PLL_VCOCLK_DIV_VCODIV)); + + HDMI_WRITE(HDMI_TX_PHY_PLL_CFG, + VC4_SET_FIELD(0, VC4_HDMI_TX_PHY_PLL_CFG_PDIV)); + + HDMI_WRITE(HDMI_TX_PHY_PLL_POST_KDIV, + VC4_SET_FIELD(2, VC6_HDMI_TX_PHY_PLL_POST_KDIV_CLK0_SEL) | + VC4_SET_FIELD(1, VC6_HDMI_TX_PHY_PLL_POST_KDIV_KDIV)); + + chan0_settings = + vc6_phy_get_channel_settings(variant->phy_lane_mapping[PHY_LANE_0], + pixel_freq); + HDMI_WRITE(HDMI_TX_PHY_CTL_0, + VC4_SET_FIELD(chan0_settings->ext_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_CTL) | + VC4_SET_FIELD(chan0_settings->ffe_enable, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_ENABLE) | + VC4_SET_FIELD(chan0_settings->slew_rate_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_RATE_CTL) | + VC4_SET_FIELD(chan0_settings->ffe_post_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_POST_TAP_EN) | + VC4_SET_FIELD(chan0_settings->ldmos_bias_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_LDMOS_BIAS_CTL) | + VC4_SET_FIELD(chan0_settings->com_mode_ldmos_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_COM_MODE_LDMOS_EN) | + VC4_SET_FIELD(chan0_settings->edge_sel, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EDGE_SEL) | + VC4_SET_FIELD(chan0_settings->ext_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(chan0_settings->term_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_TERM_CTL) | + VC4_SET_FIELD(chan0_settings->ext_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_EN) | + VC4_SET_FIELD(chan0_settings->int_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_EN) | + VC4_SET_FIELD(chan0_settings->int_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_CTL) | + VC4_SET_FIELD(chan0_settings->int_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(chan0_settings->main_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_MAIN_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(chan0_settings->post_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_POST_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(chan0_settings->slew_ctl_slow_loading, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_LOADING) | + VC4_SET_FIELD(chan0_settings->slew_ctl_slow_driving, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_DRIVING) | + VC4_SET_FIELD(chan0_settings->ffe_pre_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_PRE_TAP_EN)); + + chan1_settings = + vc6_phy_get_channel_settings(variant->phy_lane_mapping[PHY_LANE_1], + pixel_freq); + HDMI_WRITE(HDMI_TX_PHY_CTL_1, + VC4_SET_FIELD(chan1_settings->ext_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_CTL) | + VC4_SET_FIELD(chan1_settings->ffe_enable, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_ENABLE) | + VC4_SET_FIELD(chan1_settings->slew_rate_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_RATE_CTL) | + VC4_SET_FIELD(chan1_settings->ffe_post_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_POST_TAP_EN) | + VC4_SET_FIELD(chan1_settings->ldmos_bias_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_LDMOS_BIAS_CTL) | + VC4_SET_FIELD(chan1_settings->com_mode_ldmos_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_COM_MODE_LDMOS_EN) | + VC4_SET_FIELD(chan1_settings->edge_sel, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EDGE_SEL) | + VC4_SET_FIELD(chan1_settings->ext_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(chan1_settings->term_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_TERM_CTL) | + VC4_SET_FIELD(chan1_settings->ext_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_EN) | + VC4_SET_FIELD(chan1_settings->int_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_EN) | + VC4_SET_FIELD(chan1_settings->int_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_CTL) | + VC4_SET_FIELD(chan1_settings->int_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(chan1_settings->main_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_MAIN_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(chan1_settings->post_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_POST_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(chan1_settings->slew_ctl_slow_loading, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_LOADING) | + VC4_SET_FIELD(chan1_settings->slew_ctl_slow_driving, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_DRIVING) | + VC4_SET_FIELD(chan1_settings->ffe_pre_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_PRE_TAP_EN)); + + chan2_settings = + vc6_phy_get_channel_settings(variant->phy_lane_mapping[PHY_LANE_2], + pixel_freq); + HDMI_WRITE(HDMI_TX_PHY_CTL_2, + VC4_SET_FIELD(chan2_settings->ext_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_CTL) | + VC4_SET_FIELD(chan2_settings->ffe_enable, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_ENABLE) | + VC4_SET_FIELD(chan2_settings->slew_rate_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_RATE_CTL) | + VC4_SET_FIELD(chan2_settings->ffe_post_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_POST_TAP_EN) | + VC4_SET_FIELD(chan2_settings->ldmos_bias_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_LDMOS_BIAS_CTL) | + VC4_SET_FIELD(chan2_settings->com_mode_ldmos_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_COM_MODE_LDMOS_EN) | + VC4_SET_FIELD(chan2_settings->edge_sel, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EDGE_SEL) | + VC4_SET_FIELD(chan2_settings->ext_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(chan2_settings->term_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_TERM_CTL) | + VC4_SET_FIELD(chan2_settings->ext_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_EN) | + VC4_SET_FIELD(chan2_settings->int_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_EN) | + VC4_SET_FIELD(chan2_settings->int_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_CTL) | + VC4_SET_FIELD(chan2_settings->int_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(chan2_settings->main_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_MAIN_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(chan2_settings->post_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_POST_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(chan2_settings->slew_ctl_slow_loading, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_LOADING) | + VC4_SET_FIELD(chan2_settings->slew_ctl_slow_driving, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_DRIVING) | + VC4_SET_FIELD(chan2_settings->ffe_pre_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_PRE_TAP_EN)); + + clock_settings = + vc6_phy_get_channel_settings(variant->phy_lane_mapping[PHY_LANE_CK], + pixel_freq); + HDMI_WRITE(HDMI_TX_PHY_CTL_CK, + VC4_SET_FIELD(clock_settings->ext_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_CTL) | + VC4_SET_FIELD(clock_settings->ffe_enable, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_ENABLE) | + VC4_SET_FIELD(clock_settings->slew_rate_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_RATE_CTL) | + VC4_SET_FIELD(clock_settings->ffe_post_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_POST_TAP_EN) | + VC4_SET_FIELD(clock_settings->ldmos_bias_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_LDMOS_BIAS_CTL) | + VC4_SET_FIELD(clock_settings->com_mode_ldmos_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_COM_MODE_LDMOS_EN) | + VC4_SET_FIELD(clock_settings->edge_sel, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EDGE_SEL) | + VC4_SET_FIELD(clock_settings->ext_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(clock_settings->term_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_TERM_CTL) | + VC4_SET_FIELD(clock_settings->ext_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_EXT_CURRENT_SRC_EN) | + VC4_SET_FIELD(clock_settings->int_current_src_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_EN) | + VC4_SET_FIELD(clock_settings->int_current_ctl, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_CTL) | + VC4_SET_FIELD(clock_settings->int_current_src_hs_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_INT_CURRENT_SRC_HS_EN) | + VC4_SET_FIELD(clock_settings->main_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_MAIN_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(clock_settings->post_tap_current_select, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_POST_TAP_CURRENT_SELECT) | + VC4_SET_FIELD(clock_settings->slew_ctl_slow_loading, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_LOADING) | + VC4_SET_FIELD(clock_settings->slew_ctl_slow_driving, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_SLEW_CTL_SLOW_DRIVING) | + VC4_SET_FIELD(clock_settings->ffe_pre_tap_en, + VC6_HDMI_TX_PHY_HDMI_CTRL_CHX_FFE_PRE_TAP_EN)); + + if (pixel_freq >= 340000000) + word_sel = 3; + else + word_sel = 0; + HDMI_WRITE(HDMI_TX_PHY_TMDS_CLK_WORD_SEL, word_sel); + + HDMI_WRITE(HDMI_TX_PHY_POWERUP_CTL, + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_BG_PWRUP | + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_LDO_PWRUP | + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_BIAS_PWRUP | + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_CK_PWRUP | + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_2_PWRUP | + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_1_PWRUP | + VC6_HDMI_TX_PHY_HDMI_POWERUP_CTL_TX_0_PWRUP); + + HDMI_WRITE(HDMI_TX_PHY_PLL_POWERUP_CTL, + VC6_HDMI_TX_PHY_PLL_POWERUP_CTL_PLL_PWRUP); + + HDMI_WRITE(HDMI_TX_PHY_PLL_RESET_CTL, + HDMI_READ(HDMI_TX_PHY_PLL_RESET_CTL) & + ~VC6_HDMI_TX_PHY_PLL_RESET_CTL_PLL_RESETB); + + HDMI_WRITE(HDMI_TX_PHY_PLL_RESET_CTL, + HDMI_READ(HDMI_TX_PHY_PLL_RESET_CTL) | + VC6_HDMI_TX_PHY_PLL_RESET_CTL_PLL_RESETB); + + spin_unlock_irqrestore(&vc4_hdmi->hw_lock, flags); +} + +void vc6_hdmi_phy_disable(struct vc4_hdmi *vc4_hdmi) +{ +} diff --git a/drivers/gpu/drm/vc4/vc4_hdmi_regs.h b/drivers/gpu/drm/vc4/vc4_hdmi_regs.h index 68455ce513e7..59bfd69f54d9 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi_regs.h +++ b/drivers/gpu/drm/vc4/vc4_hdmi_regs.h @@ -111,13 +111,30 @@ enum vc4_hdmi_field { HDMI_TX_PHY_CTL_1, HDMI_TX_PHY_CTL_2, HDMI_TX_PHY_CTL_3, + HDMI_TX_PHY_CTL_CK, HDMI_TX_PHY_PLL_CALIBRATION_CONFIG_1, HDMI_TX_PHY_PLL_CALIBRATION_CONFIG_2, HDMI_TX_PHY_PLL_CALIBRATION_CONFIG_4, HDMI_TX_PHY_PLL_CFG, + HDMI_TX_PHY_PLL_CFG_PDIV, HDMI_TX_PHY_PLL_CTL_0, HDMI_TX_PHY_PLL_CTL_1, + HDMI_TX_PHY_PLL_MISC_0, + HDMI_TX_PHY_PLL_MISC_1, + HDMI_TX_PHY_PLL_MISC_2, + HDMI_TX_PHY_PLL_MISC_3, + HDMI_TX_PHY_PLL_MISC_4, + HDMI_TX_PHY_PLL_MISC_5, + HDMI_TX_PHY_PLL_MISC_6, + HDMI_TX_PHY_PLL_MISC_7, + HDMI_TX_PHY_PLL_MISC_8, + HDMI_TX_PHY_PLL_POST_KDIV, + HDMI_TX_PHY_PLL_POWERUP_CTL, + HDMI_TX_PHY_PLL_REFCLK, + HDMI_TX_PHY_PLL_RESET_CTL, + HDMI_TX_PHY_PLL_VCOCLK_DIV, HDMI_TX_PHY_POWERDOWN_CTL, + HDMI_TX_PHY_POWERUP_CTL, HDMI_TX_PHY_RESET_CTL, HDMI_TX_PHY_TMDS_CLK_WORD_SEL, HDMI_VEC_INTERFACE_CFG, @@ -411,6 +428,206 @@ static const struct vc4_hdmi_register __maybe_unused vc5_hdmi_hdmi1_fields[] = { VC5_CSC_REG(HDMI_CSC_CHANNEL_CTL, 0x02c), }; +static const struct vc4_hdmi_register __maybe_unused vc6_hdmi_hdmi0_fields[] = { + VC4_HD_REG(HDMI_DVP_CTL, 0x0000), + VC4_HD_REG(HDMI_MAI_CTL, 0x0010), + VC4_HD_REG(HDMI_MAI_THR, 0x0014), + VC4_HD_REG(HDMI_MAI_FMT, 0x0018), + VC4_HD_REG(HDMI_MAI_DATA, 0x001c), + VC4_HD_REG(HDMI_MAI_SMP, 0x0020), + VC4_HD_REG(HDMI_VID_CTL, 0x0044), + VC4_HD_REG(HDMI_FRAME_COUNT, 0x0060), + + VC4_HDMI_REG(HDMI_FIFO_CTL, 0x07c), + VC4_HDMI_REG(HDMI_AUDIO_PACKET_CONFIG, 0x0c0), + VC4_HDMI_REG(HDMI_RAM_PACKET_CONFIG, 0x0c4), + VC4_HDMI_REG(HDMI_RAM_PACKET_STATUS, 0x0cc), + VC4_HDMI_REG(HDMI_CRP_CFG, 0x0d0), + VC4_HDMI_REG(HDMI_CTS_0, 0x0d4), + VC4_HDMI_REG(HDMI_CTS_1, 0x0d8), + VC4_HDMI_REG(HDMI_SCHEDULER_CONTROL, 0x0e8), + VC4_HDMI_REG(HDMI_HORZA, 0x0ec), + VC4_HDMI_REG(HDMI_HORZB, 0x0f0), + VC4_HDMI_REG(HDMI_VERTA0, 0x0f4), + VC4_HDMI_REG(HDMI_VERTB0, 0x0f8), + VC4_HDMI_REG(HDMI_VERTA1, 0x100), + VC4_HDMI_REG(HDMI_VERTB1, 0x104), + VC4_HDMI_REG(HDMI_MISC_CONTROL, 0x114), + VC4_HDMI_REG(HDMI_MAI_CHANNEL_MAP, 0x0a4), + VC4_HDMI_REG(HDMI_MAI_CONFIG, 0x0a8), + VC4_HDMI_REG(HDMI_FORMAT_DET_1, 0x148), + VC4_HDMI_REG(HDMI_FORMAT_DET_2, 0x14c), + VC4_HDMI_REG(HDMI_FORMAT_DET_3, 0x150), + VC4_HDMI_REG(HDMI_FORMAT_DET_4, 0x158), + VC4_HDMI_REG(HDMI_FORMAT_DET_5, 0x15c), + VC4_HDMI_REG(HDMI_FORMAT_DET_6, 0x160), + VC4_HDMI_REG(HDMI_FORMAT_DET_7, 0x164), + VC4_HDMI_REG(HDMI_FORMAT_DET_8, 0x168), + VC4_HDMI_REG(HDMI_FORMAT_DET_9, 0x16c), + VC4_HDMI_REG(HDMI_FORMAT_DET_10, 0x170), + VC4_HDMI_REG(HDMI_DEEP_COLOR_CONFIG_1, 0x18c), + VC4_HDMI_REG(HDMI_GCP_CONFIG, 0x194), + VC4_HDMI_REG(HDMI_GCP_WORD_1, 0x198), + VC4_HDMI_REG(HDMI_HOTPLUG, 0x1c8), + VC4_HDMI_REG(HDMI_SCRAMBLER_CTL, 0x1e4), + + VC5_DVP_REG(HDMI_CLOCK_STOP, 0x0bc), + VC5_DVP_REG(HDMI_VEC_INTERFACE_CFG, 0x0f0), + VC5_DVP_REG(HDMI_VEC_INTERFACE_XBAR, 0x0f4), + + VC5_PHY_REG(HDMI_TX_PHY_RESET_CTL, 0x000), + VC5_PHY_REG(HDMI_TX_PHY_POWERUP_CTL, 0x004), + VC5_PHY_REG(HDMI_TX_PHY_CTL_0, 0x008), + VC5_PHY_REG(HDMI_TX_PHY_CTL_1, 0x00c), + VC5_PHY_REG(HDMI_TX_PHY_CTL_2, 0x010), + VC5_PHY_REG(HDMI_TX_PHY_CTL_CK, 0x014), + VC5_PHY_REG(HDMI_TX_PHY_PLL_REFCLK, 0x01c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_POST_KDIV, 0x028), + VC5_PHY_REG(HDMI_TX_PHY_PLL_VCOCLK_DIV, 0x02c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_CFG, 0x044), + VC5_PHY_REG(HDMI_TX_PHY_TMDS_CLK_WORD_SEL, 0x054), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_0, 0x060), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_1, 0x064), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_2, 0x068), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_3, 0x06c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_4, 0x070), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_5, 0x074), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_6, 0x078), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_7, 0x07c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_8, 0x080), + VC5_PHY_REG(HDMI_TX_PHY_PLL_RESET_CTL, 0x190), + VC5_PHY_REG(HDMI_TX_PHY_PLL_POWERUP_CTL, 0x194), + + VC5_RM_REG(HDMI_RM_CONTROL, 0x000), + VC5_RM_REG(HDMI_RM_OFFSET, 0x018), + VC5_RM_REG(HDMI_RM_FORMAT, 0x01c), + + VC5_RAM_REG(HDMI_RAM_PACKET_START, 0x000), + + VC5_CEC_REG(HDMI_CEC_CNTRL_1, 0x010), + VC5_CEC_REG(HDMI_CEC_CNTRL_2, 0x014), + VC5_CEC_REG(HDMI_CEC_CNTRL_3, 0x018), + VC5_CEC_REG(HDMI_CEC_CNTRL_4, 0x01c), + VC5_CEC_REG(HDMI_CEC_CNTRL_5, 0x020), + VC5_CEC_REG(HDMI_CEC_TX_DATA_1, 0x028), + VC5_CEC_REG(HDMI_CEC_TX_DATA_2, 0x02c), + VC5_CEC_REG(HDMI_CEC_TX_DATA_3, 0x030), + VC5_CEC_REG(HDMI_CEC_TX_DATA_4, 0x034), + VC5_CEC_REG(HDMI_CEC_RX_DATA_1, 0x038), + VC5_CEC_REG(HDMI_CEC_RX_DATA_2, 0x03c), + VC5_CEC_REG(HDMI_CEC_RX_DATA_3, 0x040), + VC5_CEC_REG(HDMI_CEC_RX_DATA_4, 0x044), + + VC5_CSC_REG(HDMI_CSC_CTL, 0x000), + VC5_CSC_REG(HDMI_CSC_12_11, 0x004), + VC5_CSC_REG(HDMI_CSC_14_13, 0x008), + VC5_CSC_REG(HDMI_CSC_22_21, 0x00c), + VC5_CSC_REG(HDMI_CSC_24_23, 0x010), + VC5_CSC_REG(HDMI_CSC_32_31, 0x014), + VC5_CSC_REG(HDMI_CSC_34_33, 0x018), + VC5_CSC_REG(HDMI_CSC_CHANNEL_CTL, 0x02c), +}; + +static const struct vc4_hdmi_register __maybe_unused vc6_hdmi_hdmi1_fields[] = { + VC4_HD_REG(HDMI_DVP_CTL, 0x0000), + VC4_HD_REG(HDMI_MAI_CTL, 0x0030), + VC4_HD_REG(HDMI_MAI_THR, 0x0034), + VC4_HD_REG(HDMI_MAI_FMT, 0x0038), + VC4_HD_REG(HDMI_MAI_DATA, 0x003c), + VC4_HD_REG(HDMI_MAI_SMP, 0x0040), + VC4_HD_REG(HDMI_VID_CTL, 0x0048), + VC4_HD_REG(HDMI_FRAME_COUNT, 0x0064), + + VC4_HDMI_REG(HDMI_FIFO_CTL, 0x07c), + VC4_HDMI_REG(HDMI_AUDIO_PACKET_CONFIG, 0x0c0), + VC4_HDMI_REG(HDMI_RAM_PACKET_CONFIG, 0x0c4), + VC4_HDMI_REG(HDMI_RAM_PACKET_STATUS, 0x0cc), + VC4_HDMI_REG(HDMI_CRP_CFG, 0x0d0), + VC4_HDMI_REG(HDMI_CTS_0, 0x0d4), + VC4_HDMI_REG(HDMI_CTS_1, 0x0d8), + VC4_HDMI_REG(HDMI_SCHEDULER_CONTROL, 0x0e8), + VC4_HDMI_REG(HDMI_HORZA, 0x0ec), + VC4_HDMI_REG(HDMI_HORZB, 0x0f0), + VC4_HDMI_REG(HDMI_VERTA0, 0x0f4), + VC4_HDMI_REG(HDMI_VERTB0, 0x0f8), + VC4_HDMI_REG(HDMI_VERTA1, 0x100), + VC4_HDMI_REG(HDMI_VERTB1, 0x104), + VC4_HDMI_REG(HDMI_MISC_CONTROL, 0x114), + VC4_HDMI_REG(HDMI_MAI_CHANNEL_MAP, 0x0a4), + VC4_HDMI_REG(HDMI_MAI_CONFIG, 0x0a8), + VC4_HDMI_REG(HDMI_FORMAT_DET_1, 0x148), + VC4_HDMI_REG(HDMI_FORMAT_DET_2, 0x14c), + VC4_HDMI_REG(HDMI_FORMAT_DET_3, 0x150), + VC4_HDMI_REG(HDMI_FORMAT_DET_4, 0x158), + VC4_HDMI_REG(HDMI_FORMAT_DET_5, 0x15c), + VC4_HDMI_REG(HDMI_FORMAT_DET_6, 0x160), + VC4_HDMI_REG(HDMI_FORMAT_DET_7, 0x164), + VC4_HDMI_REG(HDMI_FORMAT_DET_8, 0x168), + VC4_HDMI_REG(HDMI_FORMAT_DET_9, 0x16c), + VC4_HDMI_REG(HDMI_FORMAT_DET_10, 0x170), + VC4_HDMI_REG(HDMI_DEEP_COLOR_CONFIG_1, 0x18c), + VC4_HDMI_REG(HDMI_GCP_CONFIG, 0x194), + VC4_HDMI_REG(HDMI_GCP_WORD_1, 0x198), + VC4_HDMI_REG(HDMI_HOTPLUG, 0x1c8), + VC4_HDMI_REG(HDMI_SCRAMBLER_CTL, 0x1e4), + + VC5_DVP_REG(HDMI_CLOCK_STOP, 0x0bc), + VC5_DVP_REG(HDMI_VEC_INTERFACE_CFG, 0x0f0), + VC5_DVP_REG(HDMI_VEC_INTERFACE_XBAR, 0x0f4), + + VC5_PHY_REG(HDMI_TX_PHY_RESET_CTL, 0x000), + VC5_PHY_REG(HDMI_TX_PHY_POWERUP_CTL, 0x004), + VC5_PHY_REG(HDMI_TX_PHY_CTL_0, 0x008), + VC5_PHY_REG(HDMI_TX_PHY_CTL_1, 0x00c), + VC5_PHY_REG(HDMI_TX_PHY_CTL_2, 0x010), + VC5_PHY_REG(HDMI_TX_PHY_CTL_CK, 0x014), + VC5_PHY_REG(HDMI_TX_PHY_PLL_REFCLK, 0x01c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_POST_KDIV, 0x028), + VC5_PHY_REG(HDMI_TX_PHY_PLL_VCOCLK_DIV, 0x02c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_CFG, 0x044), + VC5_PHY_REG(HDMI_TX_PHY_TMDS_CLK_WORD_SEL, 0x054), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_0, 0x060), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_1, 0x064), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_2, 0x068), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_3, 0x06c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_4, 0x070), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_5, 0x074), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_6, 0x078), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_7, 0x07c), + VC5_PHY_REG(HDMI_TX_PHY_PLL_MISC_8, 0x080), + VC5_PHY_REG(HDMI_TX_PHY_PLL_RESET_CTL, 0x190), + VC5_PHY_REG(HDMI_TX_PHY_PLL_POWERUP_CTL, 0x194), + + VC5_RM_REG(HDMI_RM_CONTROL, 0x000), + VC5_RM_REG(HDMI_RM_OFFSET, 0x018), + VC5_RM_REG(HDMI_RM_FORMAT, 0x01c), + + VC5_RAM_REG(HDMI_RAM_PACKET_START, 0x000), + + VC5_CEC_REG(HDMI_CEC_CNTRL_1, 0x010), + VC5_CEC_REG(HDMI_CEC_CNTRL_2, 0x014), + VC5_CEC_REG(HDMI_CEC_CNTRL_3, 0x018), + VC5_CEC_REG(HDMI_CEC_CNTRL_4, 0x01c), + VC5_CEC_REG(HDMI_CEC_CNTRL_5, 0x020), + VC5_CEC_REG(HDMI_CEC_TX_DATA_1, 0x028), + VC5_CEC_REG(HDMI_CEC_TX_DATA_2, 0x02c), + VC5_CEC_REG(HDMI_CEC_TX_DATA_3, 0x030), + VC5_CEC_REG(HDMI_CEC_TX_DATA_4, 0x034), + VC5_CEC_REG(HDMI_CEC_RX_DATA_1, 0x038), + VC5_CEC_REG(HDMI_CEC_RX_DATA_2, 0x03c), + VC5_CEC_REG(HDMI_CEC_RX_DATA_3, 0x040), + VC5_CEC_REG(HDMI_CEC_RX_DATA_4, 0x044), + + VC5_CSC_REG(HDMI_CSC_CTL, 0x000), + VC5_CSC_REG(HDMI_CSC_12_11, 0x004), + VC5_CSC_REG(HDMI_CSC_14_13, 0x008), + VC5_CSC_REG(HDMI_CSC_22_21, 0x00c), + VC5_CSC_REG(HDMI_CSC_24_23, 0x010), + VC5_CSC_REG(HDMI_CSC_32_31, 0x014), + VC5_CSC_REG(HDMI_CSC_34_33, 0x018), + VC5_CSC_REG(HDMI_CSC_CHANNEL_CTL, 0x02c), +}; + static inline void __iomem *__vc4_hdmi_get_field_base(struct vc4_hdmi *hdmi, enum vc4_hdmi_regs reg) diff --git a/drivers/gpu/drm/vc4/vc4_hvs.c b/drivers/gpu/drm/vc4/vc4_hvs.c index 70623e6b91e9..b42027636c71 100644 --- a/drivers/gpu/drm/vc4/vc4_hvs.c +++ b/drivers/gpu/drm/vc4/vc4_hvs.c @@ -67,6 +67,140 @@ static const struct debugfs_reg32 vc4_hvs_regs[] = { VC4_REG32(SCALER_OLEDCOEF2), }; +static const struct debugfs_reg32 vc6_hvs_regs[] = { + VC4_REG32(SCALER6_VERSION), + VC4_REG32(SCALER6_CXM_SIZE), + VC4_REG32(SCALER6_LBM_SIZE), + VC4_REG32(SCALER6_UBM_SIZE), + VC4_REG32(SCALER6_COBA_SIZE), + VC4_REG32(SCALER6_COB_SIZE), + VC4_REG32(SCALER6_CONTROL), + VC4_REG32(SCALER6_FETCHER_STATUS), + VC4_REG32(SCALER6_FETCH_STATUS), + VC4_REG32(SCALER6_HANDLE_ERROR), + VC4_REG32(SCALER6_DISP0_CTRL0), + VC4_REG32(SCALER6_DISP0_CTRL1), + VC4_REG32(SCALER6_DISP0_BGND), + VC4_REG32(SCALER6_DISP0_LPTRS), + VC4_REG32(SCALER6_DISP0_COB), + VC4_REG32(SCALER6_DISP0_STATUS), + VC4_REG32(SCALER6_DISP0_DL), + VC4_REG32(SCALER6_DISP0_RUN), + VC4_REG32(SCALER6_DISP1_CTRL0), + VC4_REG32(SCALER6_DISP1_CTRL1), + VC4_REG32(SCALER6_DISP1_BGND), + VC4_REG32(SCALER6_DISP1_LPTRS), + VC4_REG32(SCALER6_DISP1_COB), + VC4_REG32(SCALER6_DISP1_STATUS), + VC4_REG32(SCALER6_DISP1_DL), + VC4_REG32(SCALER6_DISP1_RUN), + VC4_REG32(SCALER6_DISP2_CTRL0), + VC4_REG32(SCALER6_DISP2_CTRL1), + VC4_REG32(SCALER6_DISP2_BGND), + VC4_REG32(SCALER6_DISP2_LPTRS), + VC4_REG32(SCALER6_DISP2_COB), + VC4_REG32(SCALER6_DISP2_STATUS), + VC4_REG32(SCALER6_DISP2_DL), + VC4_REG32(SCALER6_DISP2_RUN), + VC4_REG32(SCALER6_EOLN), + VC4_REG32(SCALER6_DL_STATUS), + VC4_REG32(SCALER6_BFG_MISC), + VC4_REG32(SCALER6_QOS0), + VC4_REG32(SCALER6_PROF0), + VC4_REG32(SCALER6_QOS1), + VC4_REG32(SCALER6_PROF1), + VC4_REG32(SCALER6_QOS2), + VC4_REG32(SCALER6_PROF2), + VC4_REG32(SCALER6_PRI_MAP0), + VC4_REG32(SCALER6_PRI_MAP1), + VC4_REG32(SCALER6_HISTCTRL), + VC4_REG32(SCALER6_HISTBIN0), + VC4_REG32(SCALER6_HISTBIN1), + VC4_REG32(SCALER6_HISTBIN2), + VC4_REG32(SCALER6_HISTBIN3), + VC4_REG32(SCALER6_HISTBIN4), + VC4_REG32(SCALER6_HISTBIN5), + VC4_REG32(SCALER6_HISTBIN6), + VC4_REG32(SCALER6_HISTBIN7), + VC4_REG32(SCALER6_HDR_CFG_REMAP), + VC4_REG32(SCALER6_COL_SPACE), + VC4_REG32(SCALER6_HVS_ID), + VC4_REG32(SCALER6_CFC1), + VC4_REG32(SCALER6_DISP_UPM_ISO0), + VC4_REG32(SCALER6_DISP_UPM_ISO1), + VC4_REG32(SCALER6_DISP_UPM_ISO2), + VC4_REG32(SCALER6_DISP_LBM_ISO0), + VC4_REG32(SCALER6_DISP_LBM_ISO1), + VC4_REG32(SCALER6_DISP_LBM_ISO2), + VC4_REG32(SCALER6_DISP_COB_ISO0), + VC4_REG32(SCALER6_DISP_COB_ISO1), + VC4_REG32(SCALER6_DISP_COB_ISO2), + VC4_REG32(SCALER6_BAD_COB), + VC4_REG32(SCALER6_BAD_LBM), + VC4_REG32(SCALER6_BAD_UPM), + VC4_REG32(SCALER6_BAD_AXI), +}; + +static const struct debugfs_reg32 vc6_d_hvs_regs[] = { + VC4_REG32(SCALER6D_VERSION), + VC4_REG32(SCALER6D_CXM_SIZE), + VC4_REG32(SCALER6D_LBM_SIZE), + VC4_REG32(SCALER6D_UBM_SIZE), + VC4_REG32(SCALER6D_COBA_SIZE), + VC4_REG32(SCALER6D_COB_SIZE), + VC4_REG32(SCALER6D_CONTROL), + VC4_REG32(SCALER6D_FETCHER_STATUS), + VC4_REG32(SCALER6D_FETCH_STATUS), + VC4_REG32(SCALER6D_HANDLE_ERROR), + VC4_REG32(SCALER6D_DISP0_CTRL0), + VC4_REG32(SCALER6D_DISP0_CTRL1), + VC4_REG32(SCALER6D_DISP0_BGND0), + VC4_REG32(SCALER6D_DISP0_BGND1), + VC4_REG32(SCALER6D_DISP0_LPTRS), + VC4_REG32(SCALER6D_DISP0_COB), + VC4_REG32(SCALER6D_DISP0_STATUS), + VC4_REG32(SCALER6D_DISP0_DL), + VC4_REG32(SCALER6D_DISP0_RUN), + VC4_REG32(SCALER6D_DISP1_CTRL0), + VC4_REG32(SCALER6D_DISP1_CTRL1), + VC4_REG32(SCALER6D_DISP1_BGND0), + VC4_REG32(SCALER6D_DISP1_BGND1), + VC4_REG32(SCALER6D_DISP1_LPTRS), + VC4_REG32(SCALER6D_DISP1_COB), + VC4_REG32(SCALER6D_DISP1_STATUS), + VC4_REG32(SCALER6D_DISP1_DL), + VC4_REG32(SCALER6D_DISP1_RUN), + VC4_REG32(SCALER6D_DISP2_CTRL0), + VC4_REG32(SCALER6D_DISP2_CTRL1), + VC4_REG32(SCALER6D_DISP2_BGND0), + VC4_REG32(SCALER6D_DISP2_BGND1), + VC4_REG32(SCALER6D_DISP2_LPTRS), + VC4_REG32(SCALER6D_DISP2_COB), + VC4_REG32(SCALER6D_DISP2_STATUS), + VC4_REG32(SCALER6D_DISP2_DL), + VC4_REG32(SCALER6D_DISP2_RUN), + VC4_REG32(SCALER6D_EOLN), + VC4_REG32(SCALER6D_DL_STATUS), + VC4_REG32(SCALER6D_QOS0), + VC4_REG32(SCALER6D_PROF0), + VC4_REG32(SCALER6D_QOS1), + VC4_REG32(SCALER6D_PROF1), + VC4_REG32(SCALER6D_QOS2), + VC4_REG32(SCALER6D_PROF2), + VC4_REG32(SCALER6D_PRI_MAP0), + VC4_REG32(SCALER6D_PRI_MAP1), + VC4_REG32(SCALER6D_HISTCTRL), + VC4_REG32(SCALER6D_HISTBIN0), + VC4_REG32(SCALER6D_HISTBIN1), + VC4_REG32(SCALER6D_HISTBIN2), + VC4_REG32(SCALER6D_HISTBIN3), + VC4_REG32(SCALER6D_HISTBIN4), + VC4_REG32(SCALER6D_HISTBIN5), + VC4_REG32(SCALER6D_HISTBIN6), + VC4_REG32(SCALER6D_HISTBIN7), + VC4_REG32(SCALER6D_HVS_ID), +}; + void vc4_hvs_dump_state(struct vc4_hvs *hvs) { struct drm_device *drm = &hvs->vc4->base; @@ -145,6 +279,76 @@ static int vc4_hvs_debugfs_dlist(struct seq_file *m, void *data) return 0; } +static int vc6_hvs_debugfs_dlist(struct seq_file *m, void *data) +{ + struct drm_info_node *node = m->private; + struct drm_device *dev = node->minor->dev; + struct vc4_dev *vc4 = to_vc4_dev(dev); + struct vc4_hvs *hvs = vc4->hvs; + struct drm_printer p = drm_seq_file_printer(m); + unsigned int dlist_mem_size = hvs->dlist_mem_size; + unsigned int next_entry_start; + unsigned int i; + + for (i = 0; i < SCALER_CHANNELS_COUNT; i++) { + unsigned int active_dlist, dispstat; + unsigned int j; + + dispstat = VC4_GET_FIELD(HVS_READ(SCALER6_DISPX_STATUS(i)), + SCALER6_DISPX_STATUS_MODE); + if (dispstat == SCALER6_DISPX_STATUS_MODE_DISABLED || + dispstat == SCALER6_DISPX_STATUS_MODE_EOF) { + drm_printf(&p, "HVS chan %u disabled\n", i); + continue; + } + + drm_printf(&p, "HVS chan %u:\n", i); + + active_dlist = VC4_GET_FIELD(HVS_READ(SCALER6_DISPX_DL(i)), + SCALER6_DISPX_DL_LACT); + next_entry_start = 0; + + for (j = active_dlist; j < dlist_mem_size; j++) { + u32 dlist_word; + + dlist_word = readl((u32 __iomem *)vc4->hvs->dlist + j); + drm_printf(&p, "dlist: %02d: 0x%08x\n", j, + dlist_word); + if (!next_entry_start || + next_entry_start == j) { + if (dlist_word & SCALER_CTL0_END) + break; + next_entry_start = j + + VC4_GET_FIELD(dlist_word, + SCALER_CTL0_SIZE); + } + } + } + + return 0; +} + +static int vc6_hvs_debugfs_upm_allocs(struct seq_file *m, void *data) +{ + struct drm_debugfs_entry *entry = m->private; + struct drm_device *dev = entry->dev; + struct vc4_dev *vc4 = to_vc4_dev(dev); + struct vc4_hvs *hvs = vc4->hvs; + struct drm_printer p = drm_seq_file_printer(m); + struct vc4_upm_refcounts *refcount; + unsigned int i; + + drm_printf(&p, "UPM Handles:\n"); + for (i = 1; i <= VC4_NUM_UPM_HANDLES; i++) { + refcount = &hvs->upm_refcounts[i]; + drm_printf(&p, "handle %u: refcount %u, size %zu [%08llx + %08llx]\n", + i, refcount_read(&refcount->refcount), refcount->size, + refcount->upm.start, refcount->upm.size); + } + + return 0; +} + /* The filter kernel is composed of dwords each containing 3 9-bit * signed integers packed next to each other. */ @@ -215,12 +419,15 @@ static int vc4_hvs_upload_linear_kernel(struct vc4_hvs *hvs, static void vc4_hvs_lut_load(struct vc4_hvs *hvs, struct vc4_crtc *vc4_crtc) { - struct drm_device *drm = &hvs->vc4->base; + struct vc4_dev *vc4 = hvs->vc4; + struct drm_device *drm = &vc4->base; struct drm_crtc *crtc = &vc4_crtc->base; struct vc4_crtc_state *vc4_state = to_vc4_crtc_state(crtc->state); int idx; u32 i; + WARN_ON_ONCE(vc4->gen > VC4_GEN_5); + if (!drm_dev_enter(drm, &idx)) return; @@ -265,26 +472,57 @@ static void vc4_hvs_update_gamma_lut(struct vc4_hvs *hvs, u8 vc4_hvs_get_fifo_frame_count(struct vc4_hvs *hvs, unsigned int fifo) { - struct drm_device *drm = &hvs->vc4->base; + struct vc4_dev *vc4 = hvs->vc4; + struct drm_device *drm = &vc4->base; u8 field = 0; int idx; + WARN_ON_ONCE(vc4->gen > VC4_GEN_6_D); + if (!drm_dev_enter(drm, &idx)) return 0; - switch (fifo) { - case 0: - field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT1), - SCALER_DISPSTAT1_FRCNT0); + switch (vc4->gen) { + case VC4_GEN_6_C: + case VC4_GEN_6_D: + field = VC4_GET_FIELD(HVS_READ(SCALER6_DISPX_STATUS(fifo)), + SCALER6_DISPX_STATUS_FRCNT); break; - case 1: - field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT1), - SCALER_DISPSTAT1_FRCNT1); + case VC4_GEN_5: + switch (fifo) { + case 0: + field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT1), + SCALER5_DISPSTAT1_FRCNT0); + break; + case 1: + field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT1), + SCALER5_DISPSTAT1_FRCNT1); + break; + case 2: + field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT2), + SCALER5_DISPSTAT2_FRCNT2); + break; + } break; - case 2: - field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT2), - SCALER_DISPSTAT2_FRCNT2); + case VC4_GEN_4: + switch (fifo) { + case 0: + field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT1), + SCALER_DISPSTAT1_FRCNT0); + break; + case 1: + field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT1), + SCALER_DISPSTAT1_FRCNT1); + break; + case 2: + field = VC4_GET_FIELD(HVS_READ(SCALER_DISPSTAT2), + SCALER_DISPSTAT2_FRCNT2); + break; + } break; + default: + drm_err(drm, "Unknown VC4 generation: %d", vc4->gen); + return 0; } drm_dev_exit(idx); @@ -297,6 +535,8 @@ int vc4_hvs_get_fifo_from_output(struct vc4_hvs *hvs, unsigned int output) u32 reg; int ret; + WARN_ON_ONCE(vc4->gen > VC4_GEN_6_D); + switch (vc4->gen) { case VC4_GEN_4: return output; @@ -352,6 +592,24 @@ int vc4_hvs_get_fifo_from_output(struct vc4_hvs *hvs, unsigned int output) return -EPIPE; } + case VC4_GEN_6_C: + case VC4_GEN_6_D: + switch (output) { + case 0: + return 0; + + case 2: + return 2; + + case 1: + case 3: + case 4: + return 1; + + default: + return -EPIPE; + } + default: return -EPIPE; } @@ -370,6 +628,8 @@ static int vc4_hvs_init_channel(struct vc4_hvs *hvs, struct drm_crtc *crtc, u32 dispctrl; int idx; + WARN_ON_ONCE(vc4->gen > VC4_GEN_5); + if (!drm_dev_enter(drm, &idx)) return -ENODEV; @@ -420,11 +680,50 @@ static int vc4_hvs_init_channel(struct vc4_hvs *hvs, struct drm_crtc *crtc, return 0; } -void vc4_hvs_stop_channel(struct vc4_hvs *hvs, unsigned int chan) +static int vc6_hvs_init_channel(struct vc4_hvs *hvs, struct drm_crtc *crtc, + struct drm_display_mode *mode, bool oneshot) { - struct drm_device *drm = &hvs->vc4->base; + struct vc4_dev *vc4 = hvs->vc4; + struct drm_device *drm = &vc4->base; + struct vc4_crtc_state *vc4_crtc_state = to_vc4_crtc_state(crtc->state); + unsigned int chan = vc4_crtc_state->assigned_channel; + bool interlace = mode->flags & DRM_MODE_FLAG_INTERLACE; + u32 disp_ctrl1; + int idx; + + WARN_ON_ONCE(vc4->gen < VC4_GEN_6_C); + + if (!drm_dev_enter(drm, &idx)) + return -ENODEV; + + HVS_WRITE(SCALER6_DISPX_CTRL0(chan), SCALER6_DISPX_CTRL0_RESET); + + disp_ctrl1 = HVS_READ(SCALER6_DISPX_CTRL1(chan)); + disp_ctrl1 &= ~SCALER6_DISPX_CTRL1_INTLACE; + HVS_WRITE(SCALER6_DISPX_CTRL1(chan), + disp_ctrl1 | (interlace ? SCALER6_DISPX_CTRL1_INTLACE : 0)); + + HVS_WRITE(SCALER6_DISPX_CTRL0(chan), + SCALER6_DISPX_CTRL0_ENB | + VC4_SET_FIELD(mode->hdisplay - 1, + SCALER6_DISPX_CTRL0_FWIDTH) | + (oneshot ? SCALER6_DISPX_CTRL0_ONESHOT : 0) | + VC4_SET_FIELD(mode->vdisplay - 1, + SCALER6_DISPX_CTRL0_LINES)); + + drm_dev_exit(idx); + + return 0; +} + +static void __vc4_hvs_stop_channel(struct vc4_hvs *hvs, unsigned int chan) +{ + struct vc4_dev *vc4 = hvs->vc4; + struct drm_device *drm = &vc4->base; int idx; + WARN_ON_ONCE(vc4->gen > VC4_GEN_5); + if (!drm_dev_enter(drm, &idx)) return; @@ -449,6 +748,44 @@ out: drm_dev_exit(idx); } +static void __vc6_hvs_stop_channel(struct vc4_hvs *hvs, unsigned int chan) +{ + struct vc4_dev *vc4 = hvs->vc4; + struct drm_device *drm = &vc4->base; + int idx; + + WARN_ON_ONCE(vc4->gen < VC4_GEN_6_C); + + if (!drm_dev_enter(drm, &idx)) + return; + + if (!(HVS_READ(SCALER6_DISPX_CTRL0(chan)) & SCALER6_DISPX_CTRL0_ENB)) + goto out; + + HVS_WRITE(SCALER6_DISPX_CTRL0(chan), + HVS_READ(SCALER6_DISPX_CTRL0(chan)) | SCALER6_DISPX_CTRL0_RESET); + + HVS_WRITE(SCALER6_DISPX_CTRL0(chan), + HVS_READ(SCALER6_DISPX_CTRL0(chan)) & ~SCALER6_DISPX_CTRL0_ENB); + + WARN_ON_ONCE(VC4_GET_FIELD(HVS_READ(SCALER6_DISPX_STATUS(chan)), + SCALER6_DISPX_STATUS_MODE) != + SCALER6_DISPX_STATUS_MODE_DISABLED); + +out: + drm_dev_exit(idx); +} + +void vc4_hvs_stop_channel(struct vc4_hvs *hvs, unsigned int chan) +{ + struct vc4_dev *vc4 = hvs->vc4; + + if (vc4->gen >= VC4_GEN_6_C) + __vc6_hvs_stop_channel(hvs, chan); + else + __vc4_hvs_stop_channel(hvs, chan); +} + int vc4_hvs_atomic_check(struct drm_crtc *crtc, struct drm_atomic_state *state) { struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc); @@ -505,8 +842,13 @@ static void vc4_hvs_install_dlist(struct drm_crtc *crtc) if (!drm_dev_enter(dev, &idx)) return; - HVS_WRITE(SCALER_DISPLISTX(vc4_state->assigned_channel), - vc4_state->mm.start); + if (vc4->gen >= VC4_GEN_6_C) + HVS_WRITE(SCALER6_DISPX_LPTRS(vc4_state->assigned_channel), + VC4_SET_FIELD(vc4_state->mm.start, + SCALER6_DISPX_LPTRS_HEADE)); + else + HVS_WRITE(SCALER_DISPLISTX(vc4_state->assigned_channel), + vc4_state->mm.start); drm_dev_exit(idx); } @@ -561,7 +903,11 @@ void vc4_hvs_atomic_enable(struct drm_crtc *crtc, vc4_hvs_install_dlist(crtc); vc4_hvs_update_dlist(crtc); - vc4_hvs_init_channel(vc4->hvs, crtc, mode, oneshot); + + if (vc4->gen >= VC4_GEN_6_C) + vc6_hvs_init_channel(vc4->hvs, crtc, mode, oneshot); + else + vc4_hvs_init_channel(vc4->hvs, crtc, mode, oneshot); } void vc4_hvs_atomic_disable(struct drm_crtc *crtc, @@ -590,13 +936,15 @@ void vc4_hvs_atomic_flush(struct drm_crtc *crtc, struct drm_plane *plane; struct vc4_plane_state *vc4_plane_state; bool debug_dump_regs = false; - bool enable_bg_fill = false; + bool enable_bg_fill = true; u32 __iomem *dlist_start = vc4->hvs->dlist + vc4_state->mm.start; u32 __iomem *dlist_next = dlist_start; unsigned int zpos = 0; bool found = false; int idx; + WARN_ON_ONCE(vc4->gen > VC4_GEN_6_D); + if (!drm_dev_enter(dev, &idx)) { vc4_crtc_send_vblank(crtc); return; @@ -645,13 +993,26 @@ void vc4_hvs_atomic_flush(struct drm_crtc *crtc, WARN_ON_ONCE(dlist_next - dlist_start != vc4_state->mm.size); - if (enable_bg_fill) + if (vc4->gen >= VC4_GEN_6_C) { /* This sets a black background color fill, as is the case * with other DRM drivers. */ + if (enable_bg_fill) + HVS_WRITE(SCALER6_DISPX_CTRL1(channel), + HVS_READ(SCALER6_DISPX_CTRL1(channel)) | + SCALER6_DISPX_CTRL1_BGENB); + else + HVS_WRITE(SCALER6_DISPX_CTRL1(channel), + HVS_READ(SCALER6_DISPX_CTRL1(channel)) & + ~SCALER6_DISPX_CTRL1_BGENB); + } else { + /* we can actually run with a lower core clock when background + * fill is enabled on VC4_GEN_5 so leave it enabled always. + */ HVS_WRITE(SCALER_DISPBKGNDX(channel), HVS_READ(SCALER_DISPBKGNDX(channel)) | SCALER_DISPBKGND_FILL); + } /* Only update DISPLIST if the CRTC was already running and is not * being disabled. @@ -668,6 +1029,8 @@ void vc4_hvs_atomic_flush(struct drm_crtc *crtc, if (crtc->state->color_mgmt_changed) { u32 dispbkgndx = HVS_READ(SCALER_DISPBKGNDX(channel)); + WARN_ON_ONCE(vc4->gen > VC4_GEN_5); + if (crtc->state->gamma_lut) { vc4_hvs_update_gamma_lut(hvs, vc4_crtc); dispbkgndx |= SCALER_DISPBKGND_GAMMA; @@ -697,6 +1060,8 @@ void vc4_hvs_mask_underrun(struct vc4_hvs *hvs, int channel) u32 dispctrl; int idx; + WARN_ON(vc4->gen > VC4_GEN_5); + if (!drm_dev_enter(drm, &idx)) return; @@ -717,6 +1082,8 @@ void vc4_hvs_unmask_underrun(struct vc4_hvs *hvs, int channel) u32 dispctrl; int idx; + WARN_ON(vc4->gen > VC4_GEN_5); + if (!drm_dev_enter(drm, &idx)) return; @@ -751,6 +1118,8 @@ static irqreturn_t vc4_hvs_irq_handler(int irq, void *data) u32 status; u32 dspeislur; + WARN_ON(vc4->gen > VC4_GEN_5); + /* * NOTE: We don't need to protect the register access using * drm_dev_enter() there because the interrupt handler lifetime @@ -802,7 +1171,12 @@ int vc4_hvs_debugfs_init(struct drm_minor *minor) minor->debugfs_root, &vc4->load_tracker_enabled); - drm_debugfs_add_file(drm, "hvs_dlists", vc4_hvs_debugfs_dlist, NULL); + if (vc4->gen >= VC4_GEN_6_C) { + drm_debugfs_add_file(drm, "hvs_dlists", vc6_hvs_debugfs_dlist, NULL); + drm_debugfs_add_file(drm, "hvs_upm", vc6_hvs_debugfs_upm_allocs, NULL); + } else { + drm_debugfs_add_file(drm, "hvs_dlists", vc4_hvs_debugfs_dlist, NULL); + } drm_debugfs_add_file(drm, "hvs_underrun", vc4_hvs_debugfs_underrun, NULL); @@ -817,6 +1191,10 @@ struct vc4_hvs *__vc4_hvs_alloc(struct vc4_dev *vc4, { struct drm_device *drm = &vc4->base; struct vc4_hvs *hvs; + unsigned int dlist_start; + size_t dlist_size; + size_t lbm_size; + unsigned int i; hvs = drmm_kzalloc(drm, sizeof(*hvs), GFP_KERNEL); if (!hvs) @@ -828,27 +1206,94 @@ struct vc4_hvs *__vc4_hvs_alloc(struct vc4_dev *vc4, spin_lock_init(&hvs->mm_lock); - /* Set up the HVS display list memory manager. We never - * overwrite the setup from the bootloader (just 128b out of - * our 16K), since we don't want to scramble the screen when - * transitioning from the firmware's boot setup to runtime. - */ - hvs->dlist_mem_size = (SCALER_DLIST_SIZE >> 2) - HVS_BOOTLOADER_DLIST_END; - drm_mm_init(&hvs->dlist_mm, - HVS_BOOTLOADER_DLIST_END, - hvs->dlist_mem_size); + switch (vc4->gen) { + case VC4_GEN_4: + case VC4_GEN_5: + /* Set up the HVS display list memory manager. We never + * overwrite the setup from the bootloader (just 128b + * out of our 16K), since we don't want to scramble the + * screen when transitioning from the firmware's boot + * setup to runtime. + */ + dlist_start = HVS_BOOTLOADER_DLIST_END; + dlist_size = (SCALER_DLIST_SIZE >> 2) - HVS_BOOTLOADER_DLIST_END; + break; + + case VC4_GEN_6_C: + case VC4_GEN_6_D: + dlist_start = HVS_BOOTLOADER_DLIST_END; + + /* + * If we are running a test, it means that we can't + * access a register. Use a plausible size then. + */ + if (!kunit_get_current_test()) + dlist_size = HVS_READ(SCALER6_CXM_SIZE); + else + dlist_size = 4096; + + for (i = 0; i < VC4_NUM_UPM_HANDLES; i++) { + refcount_set(&hvs->upm_refcounts[i].refcount, 0); + hvs->upm_refcounts[i].hvs = hvs; + } + + break; + + default: + drm_err(drm, "Unknown VC4 generation: %d", vc4->gen); + return ERR_PTR(-ENODEV); + } + + drm_mm_init(&hvs->dlist_mm, dlist_start, dlist_size); + + hvs->dlist_mem_size = dlist_size; /* Set up the HVS LBM memory manager. We could have some more * complicated data structure that allowed reuse of LBM areas * between planes when they don't overlap on the screen, but * for now we just allocate globally. */ - if (vc4->gen == VC4_GEN_4) + + switch (vc4->gen) { + case VC4_GEN_4: /* 48k words of 2x12-bit pixels */ - drm_mm_init(&hvs->lbm_mm, 0, 48 * 1024); - else + lbm_size = 48 * SZ_1K; + break; + + case VC4_GEN_5: /* 60k words of 4x12-bit pixels */ - drm_mm_init(&hvs->lbm_mm, 0, 60 * 1024); + lbm_size = 60 * SZ_1K; + break; + + case VC4_GEN_6_C: + case VC4_GEN_6_D: + /* + * If we are running a test, it means that we can't + * access a register. Use a plausible size then. + */ + lbm_size = 1024; + break; + + default: + drm_err(drm, "Unknown VC4 generation: %d", vc4->gen); + return ERR_PTR(-ENODEV); + } + + drm_mm_init(&hvs->lbm_mm, 0, lbm_size); + + if (vc4->gen >= VC4_GEN_6_C) { + ida_init(&hvs->upm_handles); + + /* + * NOTE: On BCM2712, the size can also be read through + * the SCALER_UBM_SIZE register. We would need to do a + * register access though, which we can't do with kunit + * that also uses this function to create its mock + * device. + */ + drm_mm_init(&hvs->upm_mm, 0, 1024 * HVS_UBM_WORD_SIZE); + } + vc4->hvs = hvs; @@ -945,10 +1390,150 @@ static int vc4_hvs_hw_init(struct vc4_hvs *hvs) return 0; } +#define CFC1_N_NL_CSC_CTRL(x) (0xa000 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C00(x) (0xa008 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C01(x) (0xa00c + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C02(x) (0xa010 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C03(x) (0xa014 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C04(x) (0xa018 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C10(x) (0xa01c + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C11(x) (0xa020 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C12(x) (0xa024 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C13(x) (0xa028 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C14(x) (0xa02c + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C20(x) (0xa030 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C21(x) (0xa034 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C22(x) (0xa038 + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C23(x) (0xa03c + ((x) * 0x3000)) +#define CFC1_N_MA_CSC_COEFF_C24(x) (0xa040 + ((x) * 0x3000)) + +#define SCALER_PI_CMP_CSC_RED0(x) (0x200 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_RED1(x) (0x204 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_RED_CLAMP(x) (0x208 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_CFG(x) (0x20c + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_GREEN0(x) (0x210 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_GREEN1(x) (0x214 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_GREEN_CLAMP(x) (0x218 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_BLUE0(x) (0x220 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_BLUE1(x) (0x224 + ((x) * 0x40)) +#define SCALER_PI_CMP_CSC_BLUE_CLAMP(x) (0x228 + ((x) * 0x40)) + +/* 4 S2.22 multiplication factors, and 1 S9.15 addititive element for each of 3 + * output components + */ +struct vc6_csc_coeff_entry { + u32 csc[3][5]; +}; + +static const struct vc6_csc_coeff_entry csc_coeffs[2][3] = { + [DRM_COLOR_YCBCR_LIMITED_RANGE] = { + [DRM_COLOR_YCBCR_BT601] = { + .csc = { + { 0x004A8542, 0x0, 0x0066254A, 0x0, 0xFF908A0D }, + { 0x004A8542, 0xFFE6ED5D, 0xFFCBF856, 0x0, 0x0043C9A3 }, + { 0x004A8542, 0x00811A54, 0x0, 0x0, 0xFF759502 } + } + }, + [DRM_COLOR_YCBCR_BT709] = { + .csc = { + { 0x004A8542, 0x0, 0x0072BC44, 0x0, 0xFF83F312 }, + { 0x004A8542, 0xFFF25A22, 0xFFDDE4D0, 0x0, 0x00267064 }, + { 0x004A8542, 0x00873197, 0x0, 0x0, 0xFF6F7DC0 } + } + }, + [DRM_COLOR_YCBCR_BT2020] = { + .csc = { + { 0x004A8542, 0x0, 0x006B4A17, 0x0, 0xFF8B653F }, + { 0x004A8542, 0xFFF402D9, 0xFFDDE4D0, 0x0, 0x0024C7AE }, + { 0x004A8542, 0x008912CC, 0x0, 0x0, 0xFF6D9C8B } + } + } + }, + [DRM_COLOR_YCBCR_FULL_RANGE] = { + [DRM_COLOR_YCBCR_BT601] = { + .csc = { + { 0x00400000, 0x0, 0x0059BA5E, 0x0, 0xFFA645A1 }, + { 0x00400000, 0xFFE9F9AC, 0xFFD24B97, 0x0, 0x0043BABB }, + { 0x00400000, 0x00716872, 0x0, 0x0, 0xFF8E978D } + } + }, + [DRM_COLOR_YCBCR_BT709] = { + .csc = { + { 0x00400000, 0x0, 0x0064C985, 0x0, 0xFF9B367A }, + { 0x00400000, 0xFFF402E1, 0xFFE20A40, 0x0, 0x0029F2DE }, + { 0x00400000, 0x0076C226, 0x0, 0x0, 0xFF893DD9 } + } + }, + [DRM_COLOR_YCBCR_BT2020] = { + .csc = { + { 0x00400000, 0x0, 0x005E3F14, 0x0, 0xFFA1C0EB }, + { 0x00400000, 0xFFF577F6, 0xFFDB580F, 0x0, 0x002F2FFA }, + { 0x00400000, 0x007868DB, 0x0, 0x0, 0xFF879724 } + } + } + } +}; + +static int vc6_hvs_hw_init(struct vc4_hvs *hvs) +{ + const struct vc6_csc_coeff_entry *coeffs; + unsigned int i; + + HVS_WRITE(SCALER6_CONTROL, + SCALER6_CONTROL_HVS_EN | + VC4_SET_FIELD(8, SCALER6_CONTROL_PF_LINES) | + VC4_SET_FIELD(15, SCALER6_CONTROL_MAX_REQS)); + + /* Set HVS arbiter priority to max */ + HVS_WRITE(SCALER6(PRI_MAP0), 0xffffffff); + HVS_WRITE(SCALER6(PRI_MAP1), 0xffffffff); + + if (hvs->vc4->gen == VC4_GEN_6_C) { + for (i = 0; i < 6; i++) { + coeffs = &csc_coeffs[i / 3][i % 3]; + + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C00(i), coeffs->csc[0][0]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C01(i), coeffs->csc[0][1]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C02(i), coeffs->csc[0][2]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C03(i), coeffs->csc[0][3]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C04(i), coeffs->csc[0][4]); + + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C10(i), coeffs->csc[1][0]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C11(i), coeffs->csc[1][1]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C12(i), coeffs->csc[1][2]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C13(i), coeffs->csc[1][3]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C14(i), coeffs->csc[1][4]); + + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C20(i), coeffs->csc[2][0]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C21(i), coeffs->csc[2][1]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C22(i), coeffs->csc[2][2]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C23(i), coeffs->csc[2][3]); + HVS_WRITE(CFC1_N_MA_CSC_COEFF_C24(i), coeffs->csc[2][4]); + + HVS_WRITE(CFC1_N_NL_CSC_CTRL(i), BIT(15)); + } + } else { + for (i = 0; i < 8; i++) { + HVS_WRITE(SCALER_PI_CMP_CSC_RED0(i), 0x1f002566); + HVS_WRITE(SCALER_PI_CMP_CSC_RED1(i), 0x3994); + HVS_WRITE(SCALER_PI_CMP_CSC_RED_CLAMP(i), 0xfff00000); + HVS_WRITE(SCALER_PI_CMP_CSC_CFG(i), 0x1); + HVS_WRITE(SCALER_PI_CMP_CSC_GREEN0(i), 0x18002566); + HVS_WRITE(SCALER_PI_CMP_CSC_GREEN1(i), 0xf927eee2); + HVS_WRITE(SCALER_PI_CMP_CSC_GREEN_CLAMP(i), 0xfff00000); + HVS_WRITE(SCALER_PI_CMP_CSC_BLUE0(i), 0x18002566); + HVS_WRITE(SCALER_PI_CMP_CSC_BLUE1(i), 0x43d80000); + HVS_WRITE(SCALER_PI_CMP_CSC_BLUE_CLAMP(i), 0xfff00000); + } + } + + return 0; +} + static int vc4_hvs_cob_init(struct vc4_hvs *hvs) { struct vc4_dev *vc4 = hvs->vc4; - u32 reg, top; + u32 reg, top, base; /* * Recompute Composite Output Buffer (COB) allocations for the @@ -1009,6 +1594,32 @@ static int vc4_hvs_cob_init(struct vc4_hvs *hvs) HVS_WRITE(SCALER_DISPBASE0, reg); break; + case VC4_GEN_6_C: + case VC4_GEN_6_D: + #define VC6_COB_LINE_WIDTH 3840 + #define VC6_COB_NUM_LINES 4 + base = 0; + top = 3840; + + HVS_WRITE(SCALER6_DISPX_COB(2), + VC4_SET_FIELD(top, SCALER6_DISPX_COB_TOP) | + VC4_SET_FIELD(base, SCALER6_DISPX_COB_BASE)); + + base = top + 16; + top += VC6_COB_LINE_WIDTH * VC6_COB_NUM_LINES; + + HVS_WRITE(SCALER6_DISPX_COB(1), + VC4_SET_FIELD(top, SCALER6_DISPX_COB_TOP) | + VC4_SET_FIELD(base, SCALER6_DISPX_COB_BASE)); + + base = top + 16; + top += VC6_COB_LINE_WIDTH * VC6_COB_NUM_LINES; + + HVS_WRITE(SCALER6_DISPX_COB(0), + VC4_SET_FIELD(top, SCALER6_DISPX_COB_TOP) | + VC4_SET_FIELD(base, SCALER6_DISPX_COB_BASE)); + break; + default: return -EINVAL; } @@ -1034,10 +1645,23 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data) return PTR_ERR(hvs); hvs->regset.base = hvs->regs; - hvs->regset.regs = vc4_hvs_regs; - hvs->regset.nregs = ARRAY_SIZE(vc4_hvs_regs); - if (vc4->gen == VC4_GEN_5) { + if (vc4->gen == VC4_GEN_6_C) { + hvs->regset.regs = vc6_hvs_regs; + hvs->regset.nregs = ARRAY_SIZE(vc6_hvs_regs); + + if (VC4_GET_FIELD(HVS_READ(SCALER6_VERSION), SCALER6_VERSION) == + SCALER6_VERSION_D0) { + vc4->gen = VC4_GEN_6_D; + hvs->regset.regs = vc6_d_hvs_regs; + hvs->regset.nregs = ARRAY_SIZE(vc6_d_hvs_regs); + } + } else { + hvs->regset.regs = vc4_hvs_regs; + hvs->regset.nregs = ARRAY_SIZE(vc4_hvs_regs); + } + + if (vc4->gen >= VC4_GEN_5) { struct rpi_firmware *firmware; struct device_node *node; unsigned int max_rate; @@ -1051,12 +1675,20 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data) if (!firmware) return -EPROBE_DEFER; - hvs->core_clk = devm_clk_get(&pdev->dev, NULL); + hvs->core_clk = devm_clk_get(&pdev->dev, + (vc4->gen >= VC4_GEN_6_C) ? "core" : NULL); if (IS_ERR(hvs->core_clk)) { dev_err(&pdev->dev, "Couldn't get core clock\n"); return PTR_ERR(hvs->core_clk); } + hvs->disp_clk = devm_clk_get(&pdev->dev, + (vc4->gen >= VC4_GEN_6_C) ? "disp" : NULL); + if (IS_ERR(hvs->disp_clk)) { + dev_err(&pdev->dev, "Couldn't get disp clock\n"); + return PTR_ERR(hvs->disp_clk); + } + max_rate = rpi_firmware_clk_get_max_rate(firmware, RPI_FIRMWARE_CORE_CLK_ID); rpi_firmware_put(firmware); @@ -1073,14 +1705,23 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data) dev_err(&pdev->dev, "Couldn't enable the core clock\n"); return ret; } + + ret = clk_prepare_enable(hvs->disp_clk); + if (ret) { + dev_err(&pdev->dev, "Couldn't enable the disp clock\n"); + return ret; + } } - if (vc4->gen == VC4_GEN_4) - hvs->dlist = hvs->regs + SCALER_DLIST_START; - else + if (vc4->gen >= VC4_GEN_5) hvs->dlist = hvs->regs + SCALER5_DLIST_START; + else + hvs->dlist = hvs->regs + SCALER_DLIST_START; - ret = vc4_hvs_hw_init(hvs); + if (vc4->gen >= VC4_GEN_6_C) + ret = vc6_hvs_hw_init(hvs); + else + ret = vc4_hvs_hw_init(hvs); if (ret) return ret; @@ -1097,10 +1738,12 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data) if (ret) return ret; - ret = devm_request_irq(dev, platform_get_irq(pdev, 0), - vc4_hvs_irq_handler, 0, "vc4 hvs", drm); - if (ret) - return ret; + if (vc4->gen < VC4_GEN_6_C) { + ret = devm_request_irq(dev, platform_get_irq(pdev, 0), + vc4_hvs_irq_handler, 0, "vc4 hvs", drm); + if (ret) + return ret; + } return 0; } @@ -1125,6 +1768,7 @@ static void vc4_hvs_unbind(struct device *dev, struct device *master, drm_mm_remove_node(node); drm_mm_takedown(&vc4->hvs->lbm_mm); + clk_disable_unprepare(hvs->disp_clk); clk_disable_unprepare(hvs->core_clk); vc4->hvs = NULL; @@ -1147,6 +1791,7 @@ static void vc4_hvs_dev_remove(struct platform_device *pdev) static const struct of_device_id vc4_hvs_dt_match[] = { { .compatible = "brcm,bcm2711-hvs" }, + { .compatible = "brcm,bcm2712-hvs" }, { .compatible = "brcm,bcm2835-hvs" }, {} }; diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 58bbb9efc2df..f5b167417428 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -138,6 +138,8 @@ vc4_ctm_commit(struct vc4_dev *vc4, struct drm_atomic_state *state) struct vc4_ctm_state *ctm_state = to_vc4_ctm_state(vc4->ctm_manager.state); struct drm_color_ctm *ctm = ctm_state->ctm; + WARN_ON_ONCE(vc4->gen > VC4_GEN_5); + if (ctm_state->fifo) { HVS_WRITE(SCALER_OLEDCOEF2, VC4_SET_FIELD(vc4_ctm_s31_32_to_s0_9(ctm->matrix[0]), @@ -213,6 +215,8 @@ static void vc4_hvs_pv_muxing_commit(struct vc4_dev *vc4, struct drm_crtc *crtc; unsigned int i; + WARN_ON_ONCE(vc4->gen != VC4_GEN_4); + for_each_new_crtc_in_state(state, crtc, crtc_state, i) { struct vc4_crtc *vc4_crtc = to_vc4_crtc(crtc); struct vc4_crtc_state *vc4_state = to_vc4_crtc_state(crtc_state); @@ -256,6 +260,8 @@ static void vc5_hvs_pv_muxing_commit(struct vc4_dev *vc4, unsigned int i; u32 reg; + WARN_ON_ONCE(vc4->gen != VC4_GEN_5); + for_each_new_crtc_in_state(state, crtc, crtc_state, i) { struct vc4_crtc_state *vc4_state = to_vc4_crtc_state(crtc_state); struct vc4_crtc *vc4_crtc = to_vc4_crtc(crtc); @@ -320,17 +326,62 @@ static void vc5_hvs_pv_muxing_commit(struct vc4_dev *vc4, } } +static void vc6_hvs_pv_muxing_commit(struct vc4_dev *vc4, + struct drm_atomic_state *state) +{ + struct vc4_hvs *hvs = vc4->hvs; + struct drm_crtc_state *crtc_state; + struct drm_crtc *crtc; + unsigned int i; + + WARN_ON_ONCE(vc4->gen != VC4_GEN_6_C && vc4->gen != VC4_GEN_6_D); + + for_each_new_crtc_in_state(state, crtc, crtc_state, i) { + struct vc4_crtc_state *vc4_state = to_vc4_crtc_state(crtc_state); + struct vc4_encoder *vc4_encoder; + struct drm_encoder *encoder; + unsigned char mux; + u32 reg; + + if (!vc4_state->update_muxing) + continue; + + if (vc4_state->assigned_channel != 1) + continue; + + encoder = vc4_get_crtc_encoder(crtc, crtc_state); + vc4_encoder = to_vc4_encoder(encoder); + switch (vc4_encoder->type) { + case VC4_ENCODER_TYPE_HDMI1: + mux = 0; + break; + + case VC4_ENCODER_TYPE_TXP1: + mux = 2; + break; + + default: + drm_err(&vc4->base, "Unhandled encoder type for PV muxing %d", + vc4_encoder->type); + mux = 0; + break; + } + + reg = HVS_READ(SCALER6_CONTROL); + HVS_WRITE(SCALER6_CONTROL, + (reg & ~SCALER6_CONTROL_DSP1_TARGET_MASK) | + VC4_SET_FIELD(mux, SCALER6_CONTROL_DSP1_TARGET)); + } +} + static void vc4_atomic_commit_tail(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_hvs *hvs = vc4->hvs; - struct drm_crtc_state *new_crtc_state; struct vc4_hvs_state *new_hvs_state; - struct drm_crtc *crtc; struct vc4_hvs_state *old_hvs_state; unsigned int channel; - int i; old_hvs_state = vc4_hvs_get_old_global_state(state); if (WARN_ON(IS_ERR(old_hvs_state))) @@ -340,14 +391,20 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) if (WARN_ON(IS_ERR(new_hvs_state))) return; - for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { - struct vc4_crtc_state *vc4_crtc_state; + if (vc4->gen < VC4_GEN_6_C) { + struct drm_crtc_state *new_crtc_state; + struct drm_crtc *crtc; + int i; - if (!new_crtc_state->commit) - continue; + for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { + struct vc4_crtc_state *vc4_crtc_state; + + if (!new_crtc_state->commit) + continue; - vc4_crtc_state = to_vc4_crtc_state(new_crtc_state); - vc4_hvs_mask_underrun(hvs, vc4_crtc_state->assigned_channel); + vc4_crtc_state = to_vc4_crtc_state(new_crtc_state); + vc4_hvs_mask_underrun(hvs, vc4_crtc_state->assigned_channel); + } } for (channel = 0; channel < HVS_NUM_CHANNELS; channel++) { @@ -382,16 +439,32 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) * modeset. */ WARN_ON(clk_set_min_rate(hvs->core_clk, core_rate)); + WARN_ON(clk_set_min_rate(hvs->disp_clk, core_rate)); } drm_atomic_helper_commit_modeset_disables(dev, state); - vc4_ctm_commit(vc4, state); + if (vc4->gen <= VC4_GEN_5) + vc4_ctm_commit(vc4, state); - if (vc4->gen == VC4_GEN_5) - vc5_hvs_pv_muxing_commit(vc4, state); - else + switch (vc4->gen) { + case VC4_GEN_4: vc4_hvs_pv_muxing_commit(vc4, state); + break; + + case VC4_GEN_5: + vc5_hvs_pv_muxing_commit(vc4, state); + break; + + case VC4_GEN_6_C: + case VC4_GEN_6_D: + vc6_hvs_pv_muxing_commit(vc4, state); + break; + + default: + drm_err(dev, "Unknown VC4 generation: %d", vc4->gen); + break; + } drm_atomic_helper_commit_planes(dev, state, DRM_PLANE_COMMIT_ACTIVE_ONLY); @@ -418,6 +491,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) * requirements. */ WARN_ON(clk_set_min_rate(hvs->core_clk, core_rate)); + WARN_ON(clk_set_min_rate(hvs->disp_clk, core_rate)); drm_dbg(dev, "Core clock actual rate: %lu Hz\n", clk_get_rate(hvs->core_clk)); @@ -1056,7 +1130,10 @@ int vc4_kms_load(struct drm_device *dev) return ret; } - if (vc4->gen == VC4_GEN_5) { + if (vc4->gen >= VC4_GEN_6_C) { + dev->mode_config.max_width = 8192; + dev->mode_config.max_height = 8192; + } else if (vc4->gen >= VC4_GEN_5) { dev->mode_config.max_width = 7680; dev->mode_config.max_height = 7680; } else { diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c index ba6e86d62a77..94737c587f20 100644 --- a/drivers/gpu/drm/vc4/vc4_plane.c +++ b/drivers/gpu/drm/vc4/vc4_plane.c @@ -278,7 +278,10 @@ static bool plane_enabled(struct drm_plane_state *state) static struct drm_plane_state *vc4_plane_duplicate_state(struct drm_plane *plane) { + struct vc4_dev *vc4 = to_vc4_dev(plane->dev); + struct vc4_hvs *hvs = vc4->hvs; struct vc4_plane_state *vc4_state; + unsigned int i; if (WARN_ON(!plane->state)) return NULL; @@ -288,6 +291,12 @@ static struct drm_plane_state *vc4_plane_duplicate_state(struct drm_plane *plane return NULL; memset(&vc4_state->lbm, 0, sizeof(vc4_state->lbm)); + + for (i = 0; i < DRM_FORMAT_MAX_PLANES; i++) { + if (vc4_state->upm_handle[i]) + refcount_inc(&hvs->upm_refcounts[vc4_state->upm_handle[i]].refcount); + } + vc4_state->dlist_initialized = 0; __drm_atomic_helper_plane_duplicate_state(plane, &vc4_state->base); @@ -306,18 +315,47 @@ static struct drm_plane_state *vc4_plane_duplicate_state(struct drm_plane *plane return &vc4_state->base; } +static void vc4_plane_release_upm_ida(struct vc4_hvs *hvs, unsigned int upm_handle) +{ + struct vc4_upm_refcounts *refcount = &hvs->upm_refcounts[upm_handle]; + unsigned long irqflags; + + spin_lock_irqsave(&hvs->mm_lock, irqflags); + drm_mm_remove_node(&refcount->upm); + spin_unlock_irqrestore(&hvs->mm_lock, irqflags); + refcount->upm.start = 0; + refcount->upm.size = 0; + refcount->size = 0; + + ida_free(&hvs->upm_handles, upm_handle); +} + static void vc4_plane_destroy_state(struct drm_plane *plane, struct drm_plane_state *state) { struct vc4_dev *vc4 = to_vc4_dev(plane->dev); + struct vc4_hvs *hvs = vc4->hvs; struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + unsigned int i; if (drm_mm_node_allocated(&vc4_state->lbm)) { unsigned long irqflags; - spin_lock_irqsave(&vc4->hvs->mm_lock, irqflags); + spin_lock_irqsave(&hvs->mm_lock, irqflags); drm_mm_remove_node(&vc4_state->lbm); - spin_unlock_irqrestore(&vc4->hvs->mm_lock, irqflags); + spin_unlock_irqrestore(&hvs->mm_lock, irqflags); + } + + for (i = 0; i < DRM_FORMAT_MAX_PLANES; i++) { + struct vc4_upm_refcounts *refcount; + + if (!vc4_state->upm_handle[i]) + continue; + + refcount = &hvs->upm_refcounts[vc4_state->upm_handle[i]]; + + if (refcount_dec_and_test(&refcount->refcount)) + vc4_plane_release_upm_ida(hvs, vc4_state->upm_handle[i]); } kfree(vc4_state->dlist); @@ -528,8 +566,11 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state) static void vc4_write_tpz(struct vc4_plane_state *vc4_state, u32 src, u32 dst) { + struct vc4_dev *vc4 = to_vc4_dev(vc4_state->base.plane->dev); u32 scale, recip; + WARN_ON_ONCE(vc4->gen > VC4_GEN_6_D); + scale = src / dst; /* The specs note that while the reciprocal would be defined @@ -538,6 +579,11 @@ static void vc4_write_tpz(struct vc4_plane_state *vc4_state, u32 src, u32 dst) recip = ~0 / scale; vc4_dlist_write(vc4_state, + /* + * The BCM2712 is lacking BIT(31) compared to + * the previous generations, but we don't use + * it. + */ VC4_SET_FIELD(scale, SCALER_TPZ0_SCALE) | VC4_SET_FIELD(0, SCALER_TPZ0_IPHASE)); vc4_dlist_write(vc4_state, @@ -550,10 +596,13 @@ static void vc4_write_tpz(struct vc4_plane_state *vc4_state, u32 src, u32 dst) static void vc4_write_ppf(struct vc4_plane_state *vc4_state, u32 src, u32 dst, u32 xy, int channel) { + struct vc4_dev *vc4 = to_vc4_dev(vc4_state->base.plane->dev); u32 scale = src / dst; s32 offset, offset2; s32 phase; + WARN_ON_ONCE(vc4->gen > VC4_GEN_6_D); + /* * Start the phase at 1/2 pixel from the 1st pixel at src_x. * 1/4 pixel for YUV. @@ -598,10 +647,15 @@ static void vc4_write_ppf(struct vc4_plane_state *vc4_state, u32 src, u32 dst, vc4_dlist_write(vc4_state, SCALER_PPF_AGC | VC4_SET_FIELD(scale, SCALER_PPF_SCALE) | + /* + * The register layout documentation is slightly + * different to setup the phase in the BCM2712, + * but they seem equivalent. + */ VC4_SET_FIELD(phase, SCALER_PPF_IPHASE)); } -static u32 vc4_lbm_size(struct drm_plane_state *state) +static u32 __vc4_lbm_size(struct drm_plane_state *state) { struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); struct vc4_dev *vc4 = to_vc4_dev(state->plane->dev); @@ -649,11 +703,139 @@ static u32 vc4_lbm_size(struct drm_plane_state *state) return lbm; } +static unsigned int vc4_lbm_words_per_component(const struct drm_plane_state *state, + unsigned int channel) +{ + const struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + + switch (vc4_state->y_scaling[channel]) { + case VC4_SCALING_PPF: + return 4; + + case VC4_SCALING_TPZ: + return 2; + + default: + return 0; + } +} + +static unsigned int vc4_lbm_components(const struct drm_plane_state *state, + unsigned int channel) +{ + const struct drm_format_info *info = state->fb->format; + const struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + + if (vc4_state->y_scaling[channel] == VC4_SCALING_NONE) + return 0; + + if (info->is_yuv) + return channel ? 2 : 1; + + if (info->has_alpha) + return 4; + + return 3; +} + +static unsigned int vc4_lbm_channel_size(const struct drm_plane_state *state, + unsigned int channel) +{ + const struct drm_format_info *info = state->fb->format; + const struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + unsigned int channels_scaled = 0; + unsigned int components, words, wpc; + unsigned int width, lines; + unsigned int i; + + /* LBM is meant to use the smaller of source or dest width, but there + * is a issue with UV scaling that the size required for the second + * channel is based on the source width only. + */ + if (info->hsub > 1 && channel == 1) + width = state->src_w >> 16; + else + width = min(state->src_w >> 16, state->crtc_w); + width = round_up(width / info->hsub, 4); + + wpc = vc4_lbm_words_per_component(state, channel); + if (!wpc) + return 0; + + components = vc4_lbm_components(state, channel); + if (!components) + return 0; + + if (state->alpha != DRM_BLEND_ALPHA_OPAQUE && info->has_alpha) + components -= 1; + + words = width * wpc * components; + + lines = DIV_ROUND_UP(words, 128 / info->hsub); + + for (i = 0; i < 2; i++) + if (vc4_state->y_scaling[channel] != VC4_SCALING_NONE) + channels_scaled++; + + if (channels_scaled == 1) + lines = lines / 2; + + return lines; +} + +static unsigned int __vc6_lbm_size(const struct drm_plane_state *state) +{ + const struct drm_format_info *info = state->fb->format; + + if (info->hsub > 1) + return max(vc4_lbm_channel_size(state, 0), + vc4_lbm_channel_size(state, 1)); + else + return vc4_lbm_channel_size(state, 0); +} + +static u32 vc4_lbm_size(struct drm_plane_state *state) +{ + struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + struct vc4_dev *vc4 = to_vc4_dev(state->plane->dev); + + /* LBM is not needed when there's no vertical scaling. */ + if (vc4_state->y_scaling[0] == VC4_SCALING_NONE && + vc4_state->y_scaling[1] == VC4_SCALING_NONE) + return 0; + + if (vc4->gen >= VC4_GEN_6_C) + return __vc6_lbm_size(state); + else + return __vc4_lbm_size(state); +} + +static size_t vc6_upm_size(const struct drm_plane_state *state, + unsigned int plane) +{ + const struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + unsigned int stride = state->fb->pitches[plane]; + + /* + * TODO: This only works for raster formats, and is sub-optimal + * for buffers with a stride aligned on 32 bytes. + */ + unsigned int words_per_line = (stride + 62) / 32; + unsigned int fetch_region_size = words_per_line * 32; + unsigned int buffer_lines = 2 << vc4_state->upm_buffer_lines; + unsigned int buffer_size = fetch_region_size * buffer_lines; + + return ALIGN(buffer_size, HVS_UBM_WORD_SIZE); +} + static void vc4_write_scaling_parameters(struct drm_plane_state *state, int channel) { + struct vc4_dev *vc4 = to_vc4_dev(state->plane->dev); struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + WARN_ON_ONCE(vc4->gen > VC4_GEN_6_D); + /* Ch0 H-PPF Word 0: Scaling Parameters */ if (vc4_state->x_scaling[channel] == VC4_SCALING_PPF) { vc4_write_ppf(vc4_state, vc4_state->src_w[channel], @@ -750,6 +932,10 @@ static int vc4_plane_allocate_lbm(struct drm_plane_state *state) if (!lbm_size) return 0; + /* + * NOTE: BCM2712 doesn't need to be aligned, since the size + * returned by vc4_lbm_size() is in words already. + */ if (vc4->gen == VC4_GEN_5) lbm_size = ALIGN(lbm_size, 64); else if (vc4->gen == VC4_GEN_4) @@ -787,6 +973,108 @@ static int vc4_plane_allocate_lbm(struct drm_plane_state *state) return 0; } +static int vc6_plane_allocate_upm(struct drm_plane_state *state) +{ + const struct drm_format_info *info = state->fb->format; + struct drm_device *drm = state->plane->dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); + struct vc4_hvs *hvs = vc4->hvs; + struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + unsigned int i; + int ret; + + WARN_ON_ONCE(vc4->gen < VC4_GEN_6_C); + + vc4_state->upm_buffer_lines = SCALER6_PTR0_UPM_BUFF_SIZE_2_LINES; + + for (i = 0; i < info->num_planes; i++) { + struct vc4_upm_refcounts *refcount; + int upm_handle; + unsigned long irqflags; + size_t upm_size; + + upm_size = vc6_upm_size(state, i); + if (!upm_size) + return -EINVAL; + upm_handle = vc4_state->upm_handle[i]; + + if (upm_handle && + hvs->upm_refcounts[upm_handle].size == upm_size) { + /* Allocation is the same size as the previous user of + * the plane. Keep the allocation. + */ + vc4_state->upm_handle[i] = upm_handle; + } else { + if (upm_handle && + refcount_dec_and_test(&hvs->upm_refcounts[upm_handle].refcount)) { + vc4_plane_release_upm_ida(hvs, upm_handle); + vc4_state->upm_handle[i] = 0; + } + + upm_handle = ida_alloc_range(&hvs->upm_handles, 1, + VC4_NUM_UPM_HANDLES, + GFP_KERNEL); + if (upm_handle < 0) { + drm_dbg(drm, "Out of upm_handles\n"); + return upm_handle; + } + vc4_state->upm_handle[i] = upm_handle; + + refcount = &hvs->upm_refcounts[upm_handle]; + refcount_set(&refcount->refcount, 1); + refcount->size = upm_size; + + spin_lock_irqsave(&hvs->mm_lock, irqflags); + ret = drm_mm_insert_node_generic(&hvs->upm_mm, + &refcount->upm, + upm_size, HVS_UBM_WORD_SIZE, + 0, 0); + spin_unlock_irqrestore(&hvs->mm_lock, irqflags); + if (ret) { + drm_err(drm, "Failed to allocate UPM entry: %d\n", ret); + refcount_set(&refcount->refcount, 0); + ida_free(&hvs->upm_handles, upm_handle); + vc4_state->upm_handle[i] = 0; + return ret; + } + } + + refcount = &hvs->upm_refcounts[upm_handle]; + vc4_state->dlist[vc4_state->ptr0_offset[i]] |= + VC4_SET_FIELD(refcount->upm.start / HVS_UBM_WORD_SIZE, + SCALER6_PTR0_UPM_BASE) | + VC4_SET_FIELD(vc4_state->upm_handle[i] - 1, + SCALER6_PTR0_UPM_HANDLE) | + VC4_SET_FIELD(vc4_state->upm_buffer_lines, + SCALER6_PTR0_UPM_BUFF_SIZE); + } + + return 0; +} + +static void vc6_plane_free_upm(struct drm_plane_state *state) +{ + struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + struct drm_device *drm = state->plane->dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); + struct vc4_hvs *hvs = vc4->hvs; + unsigned int i; + + WARN_ON_ONCE(vc4->gen < VC4_GEN_6_C); + + for (i = 0; i < DRM_FORMAT_MAX_PLANES; i++) { + unsigned int upm_handle; + + upm_handle = vc4_state->upm_handle[i]; + if (!upm_handle) + continue; + + if (refcount_dec_and_test(&hvs->upm_refcounts[upm_handle].refcount)) + vc4_plane_release_upm_ida(hvs, upm_handle); + vc4_state->upm_handle[i] = 0; + } +} + /* * The colorspace conversion matrices are held in 3 entries in the dlist. * Create an array of them, with entries for each full and limited mode, and @@ -834,6 +1122,11 @@ static const u32 colorspace_coeffs[2][DRM_COLOR_ENCODING_MAX][3] = { static u32 vc4_hvs4_get_alpha_blend_mode(struct drm_plane_state *state) { + struct drm_device *dev = state->state->dev; + struct vc4_dev *vc4 = to_vc4_dev(dev); + + WARN_ON_ONCE(vc4->gen != VC4_GEN_4); + if (!state->fb->format->has_alpha) return VC4_SET_FIELD(SCALER_POS2_ALPHA_MODE_FIXED, SCALER_POS2_ALPHA_MODE); @@ -855,25 +1148,56 @@ static u32 vc4_hvs4_get_alpha_blend_mode(struct drm_plane_state *state) static u32 vc4_hvs5_get_alpha_blend_mode(struct drm_plane_state *state) { - if (!state->fb->format->has_alpha) - return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_FIXED, - SCALER5_CTL2_ALPHA_MODE); + struct drm_device *dev = state->state->dev; + struct vc4_dev *vc4 = to_vc4_dev(dev); - switch (state->pixel_blend_mode) { - case DRM_MODE_BLEND_PIXEL_NONE: - return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_FIXED, - SCALER5_CTL2_ALPHA_MODE); + WARN_ON_ONCE(vc4->gen != VC4_GEN_5 && vc4->gen != VC4_GEN_6_C && + vc4->gen != VC4_GEN_6_D); + + switch (vc4->gen) { default: - case DRM_MODE_BLEND_PREMULTI: - return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_PIPELINE, - SCALER5_CTL2_ALPHA_MODE) | - SCALER5_CTL2_ALPHA_PREMULT; - case DRM_MODE_BLEND_COVERAGE: - return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_PIPELINE, - SCALER5_CTL2_ALPHA_MODE); + case VC4_GEN_5: + case VC4_GEN_6_C: + if (!state->fb->format->has_alpha) + return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_FIXED, + SCALER5_CTL2_ALPHA_MODE); + + switch (state->pixel_blend_mode) { + case DRM_MODE_BLEND_PIXEL_NONE: + return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_FIXED, + SCALER5_CTL2_ALPHA_MODE); + default: + case DRM_MODE_BLEND_PREMULTI: + return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_PIPELINE, + SCALER5_CTL2_ALPHA_MODE) | + SCALER5_CTL2_ALPHA_PREMULT; + case DRM_MODE_BLEND_COVERAGE: + return VC4_SET_FIELD(SCALER5_CTL2_ALPHA_MODE_PIPELINE, + SCALER5_CTL2_ALPHA_MODE); + } + case VC4_GEN_6_D: + /* 2712-D configures fixed alpha mode in CTL0 */ + return state->pixel_blend_mode == DRM_MODE_BLEND_PREMULTI ? + SCALER5_CTL2_ALPHA_PREMULT : 0; } } +static u32 vc4_hvs6_get_alpha_mask_mode(struct drm_plane_state *state) +{ + struct drm_device *dev = state->state->dev; + struct vc4_dev *vc4 = to_vc4_dev(dev); + + WARN_ON_ONCE(vc4->gen != VC4_GEN_6_C && vc4->gen != VC4_GEN_6_D); + + if (vc4->gen == VC4_GEN_6_D && + (!state->fb->format->has_alpha || + state->pixel_blend_mode == DRM_MODE_BLEND_PIXEL_NONE)) + return VC4_SET_FIELD(SCALER6D_CTL0_ALPHA_MASK_FIXED, + SCALER6_CTL0_ALPHA_MASK); + + return VC4_SET_FIELD(SCALER6_CTL0_ALPHA_MASK_NONE, SCALER6_CTL0_ALPHA_MASK); +} + /* Writes out a full display list for an active plane to the plane's * private dlist state. */ @@ -906,6 +1230,13 @@ static int vc4_plane_mode_set(struct drm_plane *plane, if (ret) return ret; + if (!vc4_state->src_w[0] || !vc4_state->src_h[0] || + !vc4_state->crtc_w || !vc4_state->crtc_h) { + /* 0 source size probably means the plane is offscreen */ + vc4_state->dlist_initialized = 1; + return 0; + } + width = vc4_state->src_w[0] >> 16; height = vc4_state->src_h[0] >> 16; @@ -1363,6 +1694,427 @@ static int vc4_plane_mode_set(struct drm_plane *plane, return 0; } +static u32 vc6_plane_get_csc_mode(struct vc4_plane_state *vc4_state) +{ + struct drm_plane_state *state = &vc4_state->base; + struct vc4_dev *vc4 = to_vc4_dev(state->plane->dev); + u32 ret = 0; + + if (vc4_state->is_yuv) { + enum drm_color_encoding color_encoding = state->color_encoding; + enum drm_color_range color_range = state->color_range; + + /* CSC pre-loaded with: + * 0 = BT601 limited range + * 1 = BT709 limited range + * 2 = BT2020 limited range + * 3 = BT601 full range + * 4 = BT709 full range + * 5 = BT2020 full range + */ + if (color_encoding > DRM_COLOR_YCBCR_BT2020) + color_encoding = DRM_COLOR_YCBCR_BT601; + if (color_range > DRM_COLOR_YCBCR_FULL_RANGE) + color_range = DRM_COLOR_YCBCR_LIMITED_RANGE; + + if (vc4->gen == VC4_GEN_6_C) { + ret |= SCALER6C_CTL2_CSC_ENABLE; + ret |= VC4_SET_FIELD(color_encoding + (color_range * 3), + SCALER6C_CTL2_BRCM_CFC_CONTROL); + } else { + ret |= SCALER6D_CTL2_CSC_ENABLE; + ret |= VC4_SET_FIELD(color_encoding + (color_range * 3), + SCALER6D_CTL2_BRCM_CFC_CONTROL); + } + } + + return ret; +} + +static int vc6_plane_mode_set(struct drm_plane *plane, + struct drm_plane_state *state) +{ + struct drm_device *drm = plane->dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); + struct vc4_plane_state *vc4_state = to_vc4_plane_state(state); + struct drm_framebuffer *fb = state->fb; + const struct hvs_format *format = vc4_get_hvs_format(fb->format->format); + u64 base_format_mod = fourcc_mod_broadcom_mod(fb->modifier); + int num_planes = fb->format->num_planes; + u32 h_subsample = fb->format->hsub; + u32 v_subsample = fb->format->vsub; + bool mix_plane_alpha; + bool covers_screen; + u32 scl0, scl1, pitch0; + u32 tiling, src_x, src_y; + u32 width, height; + u32 hvs_format = format->hvs; + u32 offsets[3] = { 0 }; + unsigned int rotation; + int ret, i; + + if (vc4_state->dlist_initialized) + return 0; + + ret = vc4_plane_setup_clipping_and_scaling(state); + if (ret) + return ret; + + if (!vc4_state->src_w[0] || !vc4_state->src_h[0] || + !vc4_state->crtc_w || !vc4_state->crtc_h) { + /* 0 source size probably means the plane is offscreen. + * 0 destination size is a redundant plane. + */ + vc4_state->dlist_initialized = 1; + return 0; + } + + width = vc4_state->src_w[0] >> 16; + height = vc4_state->src_h[0] >> 16; + + /* SCL1 is used for Cb/Cr scaling of planar formats. For RGB + * and 4:4:4, scl1 should be set to scl0 so both channels of + * the scaler do the same thing. For YUV, the Y plane needs + * to be put in channel 1 and Cb/Cr in channel 0, so we swap + * the scl fields here. + */ + if (num_planes == 1) { + scl0 = vc4_get_scl_field(state, 0); + scl1 = scl0; + } else { + scl0 = vc4_get_scl_field(state, 1); + scl1 = vc4_get_scl_field(state, 0); + } + + rotation = drm_rotation_simplify(state->rotation, + DRM_MODE_ROTATE_0 | + DRM_MODE_REFLECT_X | + DRM_MODE_REFLECT_Y); + + /* We must point to the last line when Y reflection is enabled. */ + src_y = vc4_state->src_y >> 16; + if (rotation & DRM_MODE_REFLECT_Y) + src_y += height - 1; + + src_x = vc4_state->src_x >> 16; + + switch (base_format_mod) { + case DRM_FORMAT_MOD_LINEAR: + tiling = SCALER6_CTL0_ADDR_MODE_LINEAR; + + /* Adjust the base pointer to the first pixel to be scanned + * out. + */ + for (i = 0; i < num_planes; i++) { + offsets[i] += src_y / (i ? v_subsample : 1) * fb->pitches[i]; + offsets[i] += src_x / (i ? h_subsample : 1) * fb->format->cpp[i]; + } + + break; + + case DRM_FORMAT_MOD_BROADCOM_SAND128: + case DRM_FORMAT_MOD_BROADCOM_SAND256: { + uint32_t param = fourcc_mod_broadcom_param(fb->modifier); + u32 components_per_word; + u32 starting_offset; + u32 fetch_count; + + if (param > SCALER_TILE_HEIGHT_MASK) { + DRM_DEBUG_KMS("SAND height too large (%d)\n", + param); + return -EINVAL; + } + + if (fb->format->format == DRM_FORMAT_P030) { + hvs_format = HVS_PIXEL_FORMAT_YCBCR_10BIT; + tiling = SCALER6_CTL0_ADDR_MODE_128B; + } else { + hvs_format = HVS_PIXEL_FORMAT_YCBCR_YUV420_2PLANE; + + switch (base_format_mod) { + case DRM_FORMAT_MOD_BROADCOM_SAND128: + tiling = SCALER6_CTL0_ADDR_MODE_128B; + break; + case DRM_FORMAT_MOD_BROADCOM_SAND256: + tiling = SCALER6_CTL0_ADDR_MODE_256B; + break; + default: + return -EINVAL; + } + } + + /* Adjust the base pointer to the first pixel to be scanned + * out. + * + * For P030, y_ptr [31:4] is the 128bit word for the start pixel + * y_ptr [3:0] is the pixel (0-11) contained within that 128bit + * word that should be taken as the first pixel. + * Ditto uv_ptr [31:4] vs [3:0], however [3:0] contains the + * element within the 128bit word, eg for pixel 3 the value + * should be 6. + */ + for (i = 0; i < num_planes; i++) { + u32 tile_w, tile, x_off, pix_per_tile; + + if (fb->format->format == DRM_FORMAT_P030) { + /* + * Spec says: bits [31:4] of the given address + * should point to the 128-bit word containing + * the desired starting pixel, and bits[3:0] + * should be between 0 and 11, indicating which + * of the 12-pixels in that 128-bit word is the + * first pixel to be used + */ + u32 remaining_pixels = src_x % 96; + u32 aligned = remaining_pixels / 12; + u32 last_bits = remaining_pixels % 12; + + x_off = aligned * 16 + last_bits; + tile_w = 128; + pix_per_tile = 96; + } else { + switch (base_format_mod) { + case DRM_FORMAT_MOD_BROADCOM_SAND128: + tile_w = 128; + break; + case DRM_FORMAT_MOD_BROADCOM_SAND256: + tile_w = 256; + break; + default: + return -EINVAL; + } + pix_per_tile = tile_w / fb->format->cpp[0]; + x_off = (src_x % pix_per_tile) / + (i ? h_subsample : 1) * + fb->format->cpp[i]; + } + + tile = src_x / pix_per_tile; + + offsets[i] += param * tile_w * tile; + offsets[i] += src_y / (i ? v_subsample : 1) * tile_w; + offsets[i] += x_off & ~(i ? 1 : 0); + } + + components_per_word = fb->format->format == DRM_FORMAT_P030 ? 24 : 32; + starting_offset = src_x % components_per_word; + fetch_count = (width + starting_offset + components_per_word - 1) / + components_per_word; + + pitch0 = VC4_SET_FIELD(param, SCALER6_PTR2_PITCH) | + VC4_SET_FIELD(fetch_count - 1, SCALER6_PTR2_FETCH_COUNT); + break; + } + + default: + DRM_DEBUG_KMS("Unsupported FB tiling flag 0x%16llx", + (long long)fb->modifier); + return -EINVAL; + } + + /* fetch an extra pixel if we don't actually line up with the left edge. */ + if ((vc4_state->src_x & 0xffff) && vc4_state->src_x < (state->fb->width << 16)) + width++; + + /* same for the right side */ + if (((vc4_state->src_x + vc4_state->src_w[0]) & 0xffff) && + vc4_state->src_x + vc4_state->src_w[0] < (state->fb->width << 16)) + width++; + + /* now for the top */ + if ((vc4_state->src_y & 0xffff) && vc4_state->src_y < (state->fb->height << 16)) + height++; + + /* and the bottom */ + if (((vc4_state->src_y + vc4_state->src_h[0]) & 0xffff) && + vc4_state->src_y + vc4_state->src_h[0] < (state->fb->height << 16)) + height++; + + /* for YUV444 hardware wants double the width, otherwise it doesn't + * fetch full width of chroma + */ + if (format->drm == DRM_FORMAT_YUV444 || format->drm == DRM_FORMAT_YVU444) + width <<= 1; + + /* Don't waste cycles mixing with plane alpha if the set alpha + * is opaque or there is no per-pixel alpha information. + * In any case we use the alpha property value as the fixed alpha. + */ + mix_plane_alpha = state->alpha != DRM_BLEND_ALPHA_OPAQUE && + fb->format->has_alpha; + + /* Control Word 0: Scaling Configuration & Element Validity*/ + vc4_dlist_write(vc4_state, + SCALER6_CTL0_VALID | + VC4_SET_FIELD(tiling, SCALER6_CTL0_ADDR_MODE) | + vc4_hvs6_get_alpha_mask_mode(state) | + (vc4_state->is_unity ? SCALER6_CTL0_UNITY : 0) | + VC4_SET_FIELD(format->pixel_order_hvs5, SCALER6_CTL0_ORDERRGBA) | + VC4_SET_FIELD(scl1, SCALER6_CTL0_SCL1_MODE) | + VC4_SET_FIELD(scl0, SCALER6_CTL0_SCL0_MODE) | + VC4_SET_FIELD(hvs_format, SCALER6_CTL0_PIXEL_FORMAT)); + + /* Position Word 0: Image Position */ + vc4_state->pos0_offset = vc4_state->dlist_count; + vc4_dlist_write(vc4_state, + VC4_SET_FIELD(vc4_state->crtc_y, SCALER6_POS0_START_Y) | + (rotation & DRM_MODE_REFLECT_X ? SCALER6_POS0_HFLIP : 0) | + VC4_SET_FIELD(vc4_state->crtc_x, SCALER6_POS0_START_X)); + + /* Control Word 2: Alpha Value & CSC */ + vc4_dlist_write(vc4_state, + vc6_plane_get_csc_mode(vc4_state) | + vc4_hvs5_get_alpha_blend_mode(state) | + (mix_plane_alpha ? SCALER6_CTL2_ALPHA_MIX : 0) | + VC4_SET_FIELD(state->alpha >> 4, SCALER5_CTL2_ALPHA)); + + /* Position Word 1: Scaled Image Dimensions */ + if (!vc4_state->is_unity) + vc4_dlist_write(vc4_state, + VC4_SET_FIELD(vc4_state->crtc_h - 1, + SCALER6_POS1_SCL_LINES) | + VC4_SET_FIELD(vc4_state->crtc_w - 1, + SCALER6_POS1_SCL_WIDTH)); + + /* Position Word 2: Source Image Size */ + vc4_state->pos2_offset = vc4_state->dlist_count; + vc4_dlist_write(vc4_state, + VC4_SET_FIELD(height - 1, + SCALER6_POS2_SRC_LINES) | + VC4_SET_FIELD(width - 1, + SCALER6_POS2_SRC_WIDTH)); + + /* Position Word 3: Context */ + vc4_dlist_write(vc4_state, 0xc0c0c0c0); + + /* + * TODO: This only covers Raster Scan Order planes + */ + for (i = 0; i < num_planes; i++) { + struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, i); + dma_addr_t paddr = bo->dma_addr + fb->offsets[i] + offsets[i]; + + /* Pointer Word 0 */ + vc4_state->ptr0_offset[i] = vc4_state->dlist_count; + vc4_dlist_write(vc4_state, + (rotation & DRM_MODE_REFLECT_Y ? SCALER6_PTR0_VFLIP : 0) | + /* + * The UPM buffer will be allocated in + * vc6_plane_allocate_upm(). + */ + VC4_SET_FIELD(upper_32_bits(paddr) & 0xff, + SCALER6_PTR0_UPPER_ADDR)); + + /* Pointer Word 1 */ + vc4_dlist_write(vc4_state, lower_32_bits(paddr)); + + /* Pointer Word 2 */ + if (base_format_mod != DRM_FORMAT_MOD_BROADCOM_SAND128 && + base_format_mod != DRM_FORMAT_MOD_BROADCOM_SAND256) { + vc4_dlist_write(vc4_state, + VC4_SET_FIELD(fb->pitches[i], + SCALER6_PTR2_PITCH)); + } else { + vc4_dlist_write(vc4_state, pitch0); + } + } + + /* + * Palette Word 0 + * TODO: We're not using the palette mode + */ + + /* + * Trans Word 0 + * TODO: It's only relevant if we set the trans_rgb bit in the + * control word 0, and we don't at the moment. + */ + + vc4_state->lbm_offset = 0; + + if (!vc4_state->is_unity || fb->format->is_yuv) { + /* + * Reserve a slot for the LBM Base Address. The real value will + * be set when calling vc4_plane_allocate_lbm(). + */ + if (vc4_state->y_scaling[0] != VC4_SCALING_NONE || + vc4_state->y_scaling[1] != VC4_SCALING_NONE) { + vc4_state->lbm_offset = vc4_state->dlist_count; + vc4_dlist_counter_increment(vc4_state); + } + + if (vc4_state->x_scaling[0] != VC4_SCALING_NONE || + vc4_state->x_scaling[1] != VC4_SCALING_NONE || + vc4_state->y_scaling[0] != VC4_SCALING_NONE || + vc4_state->y_scaling[1] != VC4_SCALING_NONE) { + if (num_planes > 1) + /* + * Emit Cb/Cr as channel 0 and Y as channel + * 1. This matches how we set up scl0/scl1 + * above. + */ + vc4_write_scaling_parameters(state, 1); + + vc4_write_scaling_parameters(state, 0); + } + + /* + * If any PPF setup was done, then all the kernel + * pointers get uploaded. + */ + if (vc4_state->x_scaling[0] == VC4_SCALING_PPF || + vc4_state->y_scaling[0] == VC4_SCALING_PPF || + vc4_state->x_scaling[1] == VC4_SCALING_PPF || + vc4_state->y_scaling[1] == VC4_SCALING_PPF) { + u32 kernel = + VC4_SET_FIELD(vc4->hvs->mitchell_netravali_filter.start, + SCALER_PPF_KERNEL_OFFSET); + + /* HPPF plane 0 */ + vc4_dlist_write(vc4_state, kernel); + /* VPPF plane 0 */ + vc4_dlist_write(vc4_state, kernel); + /* HPPF plane 1 */ + vc4_dlist_write(vc4_state, kernel); + /* VPPF plane 1 */ + vc4_dlist_write(vc4_state, kernel); + } + } + + vc4_dlist_write(vc4_state, SCALER6_CTL0_END); + + vc4_state->dlist[0] |= + VC4_SET_FIELD(vc4_state->dlist_count, SCALER6_CTL0_NEXT); + + /* crtc_* are already clipped coordinates. */ + covers_screen = vc4_state->crtc_x == 0 && vc4_state->crtc_y == 0 && + vc4_state->crtc_w == state->crtc->mode.hdisplay && + vc4_state->crtc_h == state->crtc->mode.vdisplay; + + /* + * Background fill might be necessary when the plane has per-pixel + * alpha content or a non-opaque plane alpha and could blend from the + * background or does not cover the entire screen. + */ + vc4_state->needs_bg_fill = fb->format->has_alpha || !covers_screen || + state->alpha != DRM_BLEND_ALPHA_OPAQUE; + + /* + * Flag the dlist as initialized to avoid checking it twice in case + * the async update check already called vc4_plane_mode_set() and + * decided to fallback to sync update because async update was not + * possible. + */ + vc4_state->dlist_initialized = 1; + + vc4_plane_calc_load(state); + + drm_dbg_driver(drm, "[PLANE:%d:%s] Computed DLIST size: %u\n", + plane->base.id, plane->name, vc4_state->dlist_count); + + return 0; +} + /* If a modeset involves changing the setup of a plane, the atomic * infrastructure will call this to validate a proposed plane setup. * However, if a plane isn't getting updated, this (and the @@ -1373,6 +2125,7 @@ static int vc4_plane_mode_set(struct drm_plane *plane, static int vc4_plane_atomic_check(struct drm_plane *plane, struct drm_atomic_state *state) { + struct vc4_dev *vc4 = to_vc4_dev(plane->dev); struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); struct vc4_plane_state *vc4_state = to_vc4_plane_state(new_plane_state); @@ -1380,17 +2133,38 @@ static int vc4_plane_atomic_check(struct drm_plane *plane, vc4_state->dlist_count = 0; - if (!plane_enabled(new_plane_state)) + if (!plane_enabled(new_plane_state)) { + struct drm_plane_state *old_plane_state = + drm_atomic_get_old_plane_state(state, plane); + + if (vc4->gen >= VC4_GEN_6_C && old_plane_state && + plane_enabled(old_plane_state)) { + vc6_plane_free_upm(new_plane_state); + } return 0; + } - ret = vc4_plane_mode_set(plane, new_plane_state); + if (vc4->gen >= VC4_GEN_6_C) + ret = vc6_plane_mode_set(plane, new_plane_state); + else + ret = vc4_plane_mode_set(plane, new_plane_state); if (ret) return ret; + if (!vc4_state->src_w[0] || !vc4_state->src_h[0] || + !vc4_state->crtc_w || !vc4_state->crtc_h) + return 0; + ret = vc4_plane_allocate_lbm(new_plane_state); if (ret) return ret; + if (vc4->gen >= VC4_GEN_6_C) { + ret = vc6_plane_allocate_upm(new_plane_state); + if (ret) + return ret; + } + return 0; } @@ -1439,7 +2213,8 @@ void vc4_plane_async_set_fb(struct drm_plane *plane, struct drm_framebuffer *fb) { struct vc4_plane_state *vc4_state = to_vc4_plane_state(plane->state); struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, 0); - uint32_t addr; + struct vc4_dev *vc4 = to_vc4_dev(plane->dev); + dma_addr_t dma_addr = bo->dma_addr + fb->offsets[0]; int idx; if (!drm_dev_enter(plane->dev, &idx)) @@ -1449,19 +2224,38 @@ void vc4_plane_async_set_fb(struct drm_plane *plane, struct drm_framebuffer *fb) * because this is only called on the primary plane. */ WARN_ON_ONCE(plane->state->crtc_x < 0 || plane->state->crtc_y < 0); - addr = bo->dma_addr + fb->offsets[0]; - /* Write the new address into the hardware immediately. The - * scanout will start from this address as soon as the FIFO - * needs to refill with pixels. - */ - writel(addr, &vc4_state->hw_dlist[vc4_state->ptr0_offset[0]]); + if (vc4->gen == VC4_GEN_6_C) { + u32 value; - /* Also update the CPU-side dlist copy, so that any later - * atomic updates that don't do a new modeset on our plane - * also use our updated address. - */ - vc4_state->dlist[vc4_state->ptr0_offset[0]] = addr; + value = vc4_state->dlist[vc4_state->ptr0_offset[0]] & + ~SCALER6_PTR0_UPPER_ADDR_MASK; + value |= VC4_SET_FIELD(upper_32_bits(dma_addr) & 0xff, + SCALER6_PTR0_UPPER_ADDR); + + writel(value, &vc4_state->hw_dlist[vc4_state->ptr0_offset[0]]); + vc4_state->dlist[vc4_state->ptr0_offset[0]] = value; + + value = lower_32_bits(dma_addr); + writel(value, &vc4_state->hw_dlist[vc4_state->ptr0_offset[0] + 1]); + vc4_state->dlist[vc4_state->ptr0_offset[0] + 1] = value; + } else { + u32 addr; + + addr = (u32)dma_addr; + + /* Write the new address into the hardware immediately. The + * scanout will start from this address as soon as the FIFO + * needs to refill with pixels. + */ + writel(addr, &vc4_state->hw_dlist[vc4_state->ptr0_offset[0]]); + + /* Also update the CPU-side dlist copy, so that any later + * atomic updates that don't do a new modeset on our plane + * also use our updated address. + */ + vc4_state->dlist[vc4_state->ptr0_offset[0]] = addr; + } drm_dev_exit(idx); } @@ -1543,13 +2337,17 @@ static void vc4_plane_atomic_async_update(struct drm_plane *plane, static int vc4_plane_atomic_async_check(struct drm_plane *plane, struct drm_atomic_state *state) { + struct vc4_dev *vc4 = to_vc4_dev(plane->dev); struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); struct vc4_plane_state *old_vc4_state, *new_vc4_state; int ret; u32 i; - ret = vc4_plane_mode_set(plane, new_plane_state); + if (vc4->gen <= VC4_GEN_5) + ret = vc4_plane_mode_set(plane, new_plane_state); + else + ret = vc6_plane_mode_set(plane, new_plane_state); if (ret) return ret; @@ -1723,7 +2521,7 @@ struct drm_plane *vc4_plane_init(struct drm_device *dev, }; for (i = 0; i < ARRAY_SIZE(hvs_formats); i++) { - if (!hvs_formats[i].hvs5_only || vc4->gen == VC4_GEN_5) { + if (!hvs_formats[i].hvs5_only || vc4->gen >= VC4_GEN_5) { formats[num_formats] = hvs_formats[i].drm; num_formats++; } @@ -1738,7 +2536,7 @@ struct drm_plane *vc4_plane_init(struct drm_device *dev, return ERR_CAST(vc4_plane); plane = &vc4_plane->base; - if (vc4->gen == VC4_GEN_5) + if (vc4->gen >= VC4_GEN_5) drm_plane_helper_add(plane, &vc5_plane_helper_funcs); else drm_plane_helper_add(plane, &vc4_plane_helper_funcs); diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h index c55dec383929..27158be19952 100644 --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -19,6 +19,20 @@ #define VC4_GET_FIELD(word, field) FIELD_GET(field##_MASK, word) +#define VC6_SET_FIELD(value, field) \ + ({ \ + WARN_ON(!FIELD_FIT(hvs->vc4->gen == VC4_GEN_6_C ? \ + SCALER6_ ## field ## _MASK : \ + SCALER6D_ ## field ## _MASK, value));\ + FIELD_PREP(hvs->vc4->gen == VC4_GEN_6_C ? \ + SCALER6_ ## field ## _MASK : \ + SCALER6D_ ## field ## _MASK, value); \ + }) + +#define VC6_GET_FIELD(word, field) FIELD_GET(hvs->vc4->gen == VC4_GEN_6_C ? \ + SCALER6_ ## field ## _MASK : \ + SCALER6D_ ## field ## _MASK, word) + #define V3D_IDENT0 0x00000 # define V3D_EXPECTED_IDENT0 \ ((2 << 24) | \ @@ -155,6 +169,7 @@ # define PV_CONTROL_EN BIT(0) #define PV_V_CONTROL 0x04 +# define PV_VCONTROL_ODD_TIMING BIT(29) # define PV_VCONTROL_ODD_DELAY_MASK VC4_MASK(22, 6) # define PV_VCONTROL_ODD_DELAY_SHIFT 6 # define PV_VCONTROL_ODD_FIRST BIT(5) @@ -215,6 +230,11 @@ # define PV_MUX_CFG_RGB_PIXEL_MUX_MODE_SHIFT 2 # define PV_MUX_CFG_RGB_PIXEL_MUX_MODE_NO_SWAP 8 +#define PV_PIPE_INIT_CTRL 0x94 +# define PV_PIPE_INIT_CTRL_PV_INIT_WIDTH_MASK VC4_MASK(11, 8) +# define PV_PIPE_INIT_CTRL_PV_INIT_IDLE_MASK VC4_MASK(7, 4) +# define PV_PIPE_INIT_CTRL_PV_INIT_EN BIT(0) + #define SCALER_CHANNELS_COUNT 3 #define SCALER_DISPCTRL 0x00000000 @@ -418,6 +438,10 @@ # define SCALER_DISPSTAT1_FRCNT0_SHIFT 18 # define SCALER_DISPSTAT1_FRCNT1_MASK VC4_MASK(17, 12) # define SCALER_DISPSTAT1_FRCNT1_SHIFT 12 +# define SCALER5_DISPSTAT1_FRCNT0_MASK VC4_MASK(25, 20) +# define SCALER5_DISPSTAT1_FRCNT0_SHIFT 20 +# define SCALER5_DISPSTAT1_FRCNT1_MASK VC4_MASK(19, 14) +# define SCALER5_DISPSTAT1_FRCNT1_SHIFT 14 #define SCALER_DISPSTATX(x) (SCALER_DISPSTAT0 + \ (x) * (SCALER_DISPSTAT1 - \ @@ -436,6 +460,8 @@ #define SCALER_DISPSTAT2 0x00000068 # define SCALER_DISPSTAT2_FRCNT2_MASK VC4_MASK(17, 12) # define SCALER_DISPSTAT2_FRCNT2_SHIFT 12 +# define SCALER5_DISPSTAT2_FRCNT2_MASK VC4_MASK(19, 14) +# define SCALER5_DISPSTAT2_FRCNT2_SHIFT 14 #define SCALER_DISPBASE2 0x0000006c #define SCALER_DISPALPHA2 0x00000070 @@ -514,6 +540,206 @@ #define SCALER5_DLIST_START 0x00004000 +#define SCALER6_VERSION 0x00000000 +# define SCALER6_VERSION_MASK VC4_MASK(7, 0) +# define SCALER6_VERSION_C0 0x00000053 +# define SCALER6_VERSION_D0 0x00000054 +#define SCALER6_CXM_SIZE 0x00000004 +#define SCALER6_LBM_SIZE 0x00000008 +#define SCALER6_UBM_SIZE 0x0000000c +#define SCALER6_COBA_SIZE 0x00000010 +#define SCALER6_COB_SIZE 0x00000014 + +#define SCALER6_CONTROL 0x00000020 +# define SCALER6_CONTROL_HVS_EN BIT(31) +# define SCALER6_CONTROL_PF_LINES_MASK VC4_MASK(22, 18) +# define SCALER6_CONTROL_ABORT_ON_EMPTY BIT(16) +# define SCALER6_CONTROL_DSP1_TARGET_MASK VC4_MASK(13, 12) +# define SCALER6_CONTROL_MAX_REQS_MASK VC4_MASK(7, 4) + +#define SCALER6_FETCHER_STATUS 0x00000024 +#define SCALER6_FETCH_STATUS 0x00000028 +#define SCALER6_HANDLE_ERROR 0x0000002c + +#define SCALER6_DISP0_CTRL0 0x00000030 +#define SCALER6_DISPX_CTRL0(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_CTRL0 + ((x) * (SCALER6_DISP1_CTRL0 - SCALER6_DISP0_CTRL0))) : \ + (SCALER6D_DISP0_CTRL0 + ((x) * (SCALER6D_DISP1_CTRL0 - SCALER6D_DISP0_CTRL0)))) +# define SCALER6_DISPX_CTRL0_ENB BIT(31) +# define SCALER6_DISPX_CTRL0_RESET BIT(30) +# define SCALER6_DISPX_CTRL0_FWIDTH_MASK VC4_MASK(28, 16) +# define SCALER6_DISPX_CTRL0_ONESHOT BIT(15) +# define SCALER6_DISPX_CTRL0_ONECTX_MASK VC4_MASK(14, 13) +# define SCALER6_DISPX_CTRL0_LINES_MASK VC4_MASK(12, 0) + +#define SCALER6_DISP0_CTRL1 0x00000034 +#define SCALER6_DISPX_CTRL1(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_CTRL1 + ((x) * (SCALER6_DISP1_CTRL1 - SCALER6_DISP0_CTRL1))) : \ + (SCALER6D_DISP0_CTRL1 + ((x) * (SCALER6D_DISP1_CTRL1 - SCALER6D_DISP0_CTRL1)))) +# define SCALER6_DISPX_CTRL1_BGENB BIT(8) +# define SCALER6_DISPX_CTRL1_INTLACE BIT(0) + +#define SCALER6_DISP0_BGND 0x00000038 +#define SCALER6_DISPX_BGND(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_BGND + ((x) * (SCALER6_DISP1_BGND - SCALER6_DISP0_BGND))) : \ + (SCALER6D_DISP0_BGND + ((x) * (SCALER6D_DISP1_BGND - SCALER6D_DISP0_BGND)))) + +#define SCALER6_DISP0_LPTRS 0x0000003c +#define SCALER6_DISPX_LPTRS(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_LPTRS + ((x) * (SCALER6_DISP1_LPTRS - SCALER6_DISP0_LPTRS))) : \ + (SCALER6D_DISP0_LPTRS + ((x) * (SCALER6D_DISP1_LPTRS - SCALER6D_DISP0_LPTRS)))) +# define SCALER6_DISPX_LPTRS_HEADE_MASK VC4_MASK(11, 0) + +#define SCALER6_DISP0_COB 0x00000040 +#define SCALER6_DISPX_COB(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_COB + ((x) * (SCALER6_DISP1_COB - SCALER6_DISP0_COB))) : \ + (SCALER6D_DISP0_COB + ((x) * (SCALER6D_DISP1_COB - SCALER6D_DISP0_COB)))) +# define SCALER6_DISPX_COB_TOP_MASK VC4_MASK(31, 16) +# define SCALER6_DISPX_COB_BASE_MASK VC4_MASK(15, 0) + +#define SCALER6_DISP0_STATUS 0x00000044 +#define SCALER6_DISPX_STATUS(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_STATUS + ((x) * (SCALER6_DISP1_STATUS - SCALER6_DISP0_STATUS))) : \ + (SCALER6D_DISP0_STATUS + ((x) * (SCALER6D_DISP1_STATUS - SCALER6D_DISP0_STATUS)))) +# define SCALER6_DISPX_STATUS_EMPTY BIT(22) +# define SCALER6_DISPX_STATUS_FRCNT_MASK VC4_MASK(21, 16) +# define SCALER6_DISPX_STATUS_OFIELD BIT(15) +# define SCALER6_DISPX_STATUS_MODE_MASK VC4_MASK(14, 13) +# define SCALER6_DISPX_STATUS_MODE_DISABLED 0 +# define SCALER6_DISPX_STATUS_MODE_INIT 1 +# define SCALER6_DISPX_STATUS_MODE_RUN 2 +# define SCALER6_DISPX_STATUS_MODE_EOF 3 +# define SCALER6_DISPX_STATUS_YLINE_MASK VC4_MASK(12, 0) + +#define SCALER6_DISP0_DL 0x00000048 + +#define SCALER6_DISPX_DL(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? \ + (SCALER6_DISP0_DL + ((x) * (SCALER6_DISP1_DL - SCALER6_DISP0_DL))) : \ + (SCALER6D_DISP0_DL + ((x) * (SCALER6D_DISP1_DL - SCALER6D_DISP0_DL)))) +# define SCALER6_DISPX_DL_LACT_MASK VC4_MASK(11, 0) + +#define SCALER6_DISP0_RUN 0x0000004c +#define SCALER6_DISP1_CTRL0 0x00000050 +#define SCALER6_DISP1_CTRL1 0x00000054 +#define SCALER6_DISP1_BGND 0x00000058 +#define SCALER6_DISP1_LPTRS 0x0000005c +#define SCALER6_DISP1_COB 0x00000060 +#define SCALER6_DISP1_STATUS 0x00000064 +#define SCALER6_DISP1_DL 0x00000068 +#define SCALER6_DISP1_RUN 0x0000006c +#define SCALER6_DISP2_CTRL0 0x00000070 +#define SCALER6_DISP2_CTRL1 0x00000074 +#define SCALER6_DISP2_BGND 0x00000078 +#define SCALER6_DISP2_LPTRS 0x0000007c +#define SCALER6_DISP2_COB 0x00000080 +#define SCALER6_DISP2_STATUS 0x00000084 +#define SCALER6_DISP2_DL 0x00000088 +#define SCALER6_DISP2_RUN 0x0000008c +#define SCALER6_EOLN 0x00000090 +#define SCALER6_DL_STATUS 0x00000094 +#define SCALER6_BFG_MISC 0x0000009c +#define SCALER6_QOS0 0x000000a0 +#define SCALER6_PROF0 0x000000a4 +#define SCALER6_QOS1 0x000000a8 +#define SCALER6_PROF1 0x000000ac +#define SCALER6_QOS2 0x000000b0 +#define SCALER6_PROF2 0x000000b4 +#define SCALER6_PRI_MAP0 0x000000b8 +#define SCALER6_PRI_MAP1 0x000000bc +#define SCALER6_HISTCTRL 0x000000c0 +#define SCALER6_HISTBIN0 0x000000c4 +#define SCALER6_HISTBIN1 0x000000c8 +#define SCALER6_HISTBIN2 0x000000cc +#define SCALER6_HISTBIN3 0x000000d0 +#define SCALER6_HISTBIN4 0x000000d4 +#define SCALER6_HISTBIN5 0x000000d8 +#define SCALER6_HISTBIN6 0x000000dc +#define SCALER6_HISTBIN7 0x000000e0 +#define SCALER6_HDR_CFG_REMAP 0x000000f4 +#define SCALER6_COL_SPACE 0x000000f8 +#define SCALER6_HVS_ID 0x000000fc +#define SCALER6_CFC1 0x00000100 +#define SCALER6_DISP_UPM_ISO0 0x00000200 +#define SCALER6_DISP_UPM_ISO1 0x00000204 +#define SCALER6_DISP_UPM_ISO2 0x00000208 +#define SCALER6_DISP_LBM_ISO0 0x0000020c +#define SCALER6_DISP_LBM_ISO1 0x00000210 +#define SCALER6_DISP_LBM_ISO2 0x00000214 +#define SCALER6_DISP_COB_ISO0 0x00000218 +#define SCALER6_DISP_COB_ISO1 0x0000021c +#define SCALER6_DISP_COB_ISO2 0x00000220 +#define SCALER6_BAD_COB 0x00000224 +#define SCALER6_BAD_LBM 0x00000228 +#define SCALER6_BAD_UPM 0x0000022c +#define SCALER6_BAD_AXI 0x00000230 + +#define SCALER6D_VERSION 0x00000000 +#define SCALER6D_CXM_SIZE 0x00000004 +#define SCALER6D_LBM_SIZE 0x00000008 +#define SCALER6D_UBM_SIZE 0x0000000c +#define SCALER6D_COBA_SIZE 0x00000010 +#define SCALER6D_COB_SIZE 0x00000014 +#define SCALER6D_CONTROL 0x00000020 +#define SCALER6D_FETCHER_STATUS 0x00000024 +#define SCALER6D_FETCH_STATUS 0x00000028 +#define SCALER6D_HANDLE_ERROR 0x0000002c +#define SCALER6D_EOLN 0x00000030 +#define SCALER6D_DL_STATUS 0x00000034 +#define SCALER6D_PRI_MAP0 0x00000038 +#define SCALER6D_PRI_MAP1 0x0000003c +#define SCALER6D_HISTCTRL 0x000000d0 +#define SCALER6D_HISTBIN0 0x000000d4 +#define SCALER6D_HISTBIN1 0x000000d8 +#define SCALER6D_HISTBIN2 0x000000dc +#define SCALER6D_HISTBIN3 0x000000e0 +#define SCALER6D_HISTBIN4 0x000000e4 +#define SCALER6D_HISTBIN5 0x000000e8 +#define SCALER6D_HISTBIN6 0x000000ec +#define SCALER6D_HISTBIN7 0x000000f0 +#define SCALER6D_HVS_ID 0x000000fc + +#define SCALER6D_DISP0_CTRL0 0x00000100 +#define SCALER6D_DISP0_CTRL1 0x00000104 +#define SCALER6D_DISP0_BGND 0x00000108 +#define SCALER6D_DISP0_LPTRS 0x00000110 +#define SCALER6D_DISP0_COB 0x00000114 +#define SCALER6D_DISP0_STATUS 0x00000118 +#define SCALER6D_DISP0_CTRL0 0x00000100 +#define SCALER6D_DISP0_CTRL1 0x00000104 +#define SCALER6D_DISP0_BGND0 0x00000108 +#define SCALER6D_DISP0_BGND1 0x0000010c +#define SCALER6D_DISP0_LPTRS 0x00000110 +#define SCALER6D_DISP0_COB 0x00000114 +#define SCALER6D_DISP0_STATUS 0x00000118 +#define SCALER6D_DISP0_DL 0x0000011c +#define SCALER6D_DISP0_RUN 0x00000120 +#define SCALER6D_QOS0 0x00000124 +#define SCALER6D_PROF0 0x00000128 +#define SCALER6D_DISP1_CTRL0 0x00000140 +#define SCALER6D_DISP1_CTRL1 0x00000144 +#define SCALER6D_DISP1_BGND0 0x00000148 +#define SCALER6D_DISP1_BGND1 0x0000014c +#define SCALER6D_DISP1_LPTRS 0x00000150 +#define SCALER6D_DISP1_COB 0x00000154 +#define SCALER6D_DISP1_STATUS 0x00000158 +#define SCALER6D_DISP1_DL 0x0000015c +#define SCALER6D_DISP1_RUN 0x00000160 +#define SCALER6D_QOS1 0x00000164 +#define SCALER6D_PROF1 0x00000168 +#define SCALER6D_DISP2_CTRL0 0x00000180 +#define SCALER6D_DISP2_CTRL1 0x00000184 +#define SCALER6D_DISP2_BGND0 0x00000188 +#define SCALER6D_DISP2_BGND1 0x0000018c +#define SCALER6D_DISP2_LPTRS 0x00000190 +#define SCALER6D_DISP2_COB 0x00000194 +#define SCALER6D_DISP2_STATUS 0x00000198 +#define SCALER6D_DISP2_DL 0x0000019c +#define SCALER6D_DISP2_RUN 0x000001a0 +#define SCALER6D_QOS2 0x000001a4 +#define SCALER6D_PROF2 0x000001a8 + +#define SCALER6(x) ((hvs->vc4->gen == VC4_GEN_6_C) ? SCALER6_ ## x : SCALER6D_ ## x) + # define VC4_HDMI_SW_RESET_FORMAT_DETECT BIT(1) # define VC4_HDMI_SW_RESET_HDMI BIT(0) @@ -761,6 +987,15 @@ enum { # define VC4_HD_MAI_THR_DREQLOW_MASK VC4_MASK(5, 0) # define VC4_HD_MAI_THR_DREQLOW_SHIFT 0 +# define VC6_D_HD_MAI_THR_PANICHIGH_MASK VC4_MASK(29, 23) +# define VC6_D_HD_MAI_THR_PANICHIGH_SHIFT 23 +# define VC6_D_HD_MAI_THR_PANICLOW_MASK VC4_MASK(21, 15) +# define VC6_D_HD_MAI_THR_PANICLOW_SHIFT 15 +# define VC6_D_HD_MAI_THR_DREQHIGH_MASK VC4_MASK(13, 7) +# define VC6_D_HD_MAI_THR_DREQHIGH_SHIFT 7 +# define VC6_D_HD_MAI_THR_DREQLOW_MASK VC4_MASK(6, 0) +# define VC6_D_HD_MAI_THR_DREQLOW_SHIFT 0 + /* Divider from HDMI HSM clock to MAI serial clock. Sampling period * converges to N / (M + 1) cycles. */ @@ -968,6 +1203,9 @@ enum hvs_pixel_format { #define SCALER5_CTL2_ALPHA_MASK VC4_MASK(15, 4) #define SCALER5_CTL2_ALPHA_SHIFT 4 +#define SCALER6D_CTL2_CSC_ENABLE BIT(19) +#define SCALER6D_CTL2_BRCM_CFC_CONTROL_MASK VC4_MASK(22, 20) + #define SCALER_POS1_SCL_HEIGHT_MASK VC4_MASK(27, 16) #define SCALER_POS1_SCL_HEIGHT_SHIFT 16 @@ -1109,4 +1347,63 @@ enum hvs_pixel_format { #define SCALER_PITCH0_TILE_WIDTH_R_MASK VC4_MASK(6, 0) #define SCALER_PITCH0_TILE_WIDTH_R_SHIFT 0 +#define SCALER6_CTL0_END BIT(31) +#define SCALER6_CTL0_VALID BIT(30) +#define SCALER6_CTL0_NEXT_MASK VC4_MASK(29, 24) +#define SCALER6_CTL0_RGB_TRANS BIT(23) +#define SCALER6_CTL0_ADDR_MODE_MASK VC4_MASK(22, 20) +#define SCALER6_CTL0_ADDR_MODE_LINEAR 0 +#define SCALER6_CTL0_ADDR_MODE_128B 1 +#define SCALER6_CTL0_ADDR_MODE_256B 2 +#define SCALER6_CTL0_ADDR_MODE_MAP8 3 +#define SCALER6_CTL0_ADDR_MODE_UIF 4 + +#define SCALER6_CTL0_ALPHA_MASK_MASK VC4_MASK(19, 18) +#define SCALER6_CTL0_ALPHA_MASK_NONE 0 +#define SCALER6D_CTL0_ALPHA_MASK_FIXED 3 +#define SCALER6_CTL0_UNITY BIT(15) +#define SCALER6_CTL0_ORDERRGBA_MASK VC4_MASK(14, 13) +#define SCALER6_CTL0_SCL1_MODE_MASK VC4_MASK(10, 8) +#define SCALER6_CTL0_SCL0_MODE_MASK VC4_MASK(7, 5) +#define SCALER6_CTL0_PIXEL_FORMAT_MASK VC4_MASK(4, 0) + +#define SCALER6_POS0_START_Y_MASK VC4_MASK(28, 16) +#define SCALER6_POS0_HFLIP BIT(15) +#define SCALER6_POS0_START_X_MASK VC4_MASK(12, 0) + +#define SCALER6_CTL2_ALPHA_MODE_MASK VC4_MASK(31, 30) +#define SCALER6_CTL2_ALPHA_PREMULT BIT(29) +#define SCALER6_CTL2_ALPHA_MIX BIT(28) +#define SCALER6_CTL2_BFG BIT(26) +#define SCALER6C_CTL2_CSC_ENABLE BIT(25) +#define SCALER6C_CTL2_BRCM_CFC_CONTROL_MASK VC4_MASK(18, 16) +#define SCALER6_CTL2_ALPHA_MASK VC4_MASK(15, 4) + +#define SCALER6_POS1_SCL_LINES_MASK VC4_MASK(28, 16) +#define SCALER6_POS1_SCL_WIDTH_MASK VC4_MASK(12, 0) + +#define SCALER6_POS2_SRC_LINES_MASK VC4_MASK(28, 16) +#define SCALER6_POS2_SRC_WIDTH_MASK VC4_MASK(12, 0) + +#define SCALER6_PTR0_VFLIP BIT(31) +#define SCALER6_PTR0_UPM_BASE_MASK VC4_MASK(28, 16) +#define SCALER6_PTR0_UPM_HANDLE_MASK VC4_MASK(14, 10) +#define SCALER6_PTR0_UPM_BUFF_SIZE_MASK VC4_MASK(9, 8) +#define SCALER6_PTR0_UPM_BUFF_SIZE_16_LINES 3 +#define SCALER6_PTR0_UPM_BUFF_SIZE_8_LINES 2 +#define SCALER6_PTR0_UPM_BUFF_SIZE_4_LINES 1 +#define SCALER6_PTR0_UPM_BUFF_SIZE_2_LINES 0 +#define SCALER6_PTR0_UPPER_ADDR_MASK VC4_MASK(7, 0) + +#define SCALER6_PTR2_ALPHA_BPP_MASK VC4_MASK(31, 31) +#define SCALER6_PTR2_ALPHA_BPP_1BPP 1 +#define SCALER6_PTR2_ALPHA_BPP_8BPP 0 +#define SCALER6_PTR2_ALPHA_ORDER_MASK VC4_MASK(30, 30) +#define SCALER6_PTR2_ALPHA_ORDER_MSB_TO_LSB 1 +#define SCALER6_PTR2_ALPHA_ORDER_LSB_TO_MSB 0 +#define SCALER6_PTR2_ALPHA_OFFS_MASK VC4_MASK(29, 27) +#define SCALER6_PTR2_LSKIP_MASK VC4_MASK(26, 24) +#define SCALER6_PTR2_PITCH_MASK VC4_MASK(16, 0) +#define SCALER6_PTR2_FETCH_COUNT_MASK VC4_MASK(26, 16) + #endif /* VC4_REGS_H */ diff --git a/drivers/gpu/drm/vc4/vc4_txp.c b/drivers/gpu/drm/vc4/vc4_txp.c index 3e38a1d2d55e..4eab069cda75 100644 --- a/drivers/gpu/drm/vc4/vc4_txp.c +++ b/drivers/gpu/drm/vc4/vc4_txp.c @@ -145,6 +145,9 @@ /* Number of lines received and committed to memory. */ #define TXP_PROGRESS 0x10 +#define TXP_DST_PTR_HIGH_MOPLET 0x1c +#define TXP_DST_PTR_HIGH_MOP 0x24 + #define TXP_READ(offset) \ ({ \ kunit_fail_current_test("Accessing a register in a unit test!\n"); \ @@ -159,6 +162,7 @@ struct vc4_txp { struct vc4_crtc base; + const struct vc4_txp_data *data; struct platform_device *pdev; @@ -286,9 +290,13 @@ static void vc4_txp_connector_atomic_commit(struct drm_connector *conn, struct drm_connector_state *conn_state = drm_atomic_get_new_connector_state(state, conn); struct vc4_txp *txp = connector_to_vc4_txp(conn); + const struct vc4_txp_data *txp_data = txp->data; struct drm_gem_dma_object *gem; struct drm_display_mode *mode; struct drm_framebuffer *fb; + unsigned int hdisplay; + unsigned int vdisplay; + dma_addr_t addr; u32 ctrl; int idx; int i; @@ -308,9 +316,11 @@ static void vc4_txp_connector_atomic_commit(struct drm_connector *conn, return; ctrl = TXP_GO | TXP_EI | - VC4_SET_FIELD(0xf, TXP_BYTE_ENABLE) | VC4_SET_FIELD(txp_fmts[i], TXP_FORMAT); + if (txp_data->has_byte_enable) + ctrl |= VC4_SET_FIELD(0xf, TXP_BYTE_ENABLE); + if (fb->format->has_alpha) ctrl |= TXP_ALPHA_ENABLE; else @@ -324,11 +334,25 @@ static void vc4_txp_connector_atomic_commit(struct drm_connector *conn, return; gem = drm_fb_dma_get_gem_obj(fb, 0); - TXP_WRITE(TXP_DST_PTR, gem->dma_addr + fb->offsets[0]); + addr = gem->dma_addr + fb->offsets[0]; + + TXP_WRITE(TXP_DST_PTR, lower_32_bits(addr)); + + if (txp_data->supports_40bit_addresses) + TXP_WRITE(txp_data->high_addr_ptr_reg, upper_32_bits(addr) & 0xff); + TXP_WRITE(TXP_DST_PITCH, fb->pitches[0]); + + hdisplay = mode->hdisplay ?: 1; + vdisplay = mode->vdisplay ?: 1; + if (txp_data->size_minus_one) { + hdisplay -= 1; + vdisplay -= 1; + } + TXP_WRITE(TXP_DIM, - VC4_SET_FIELD(mode->hdisplay, TXP_WIDTH) | - VC4_SET_FIELD(mode->vdisplay, TXP_HEIGHT)); + VC4_SET_FIELD(hdisplay, TXP_WIDTH) | + VC4_SET_FIELD(vdisplay, TXP_HEIGHT)); TXP_WRITE(TXP_DST_CTRL, ctrl); @@ -362,6 +386,7 @@ static const struct drm_connector_funcs vc4_txp_connector_funcs = { static void vc4_txp_encoder_disable(struct drm_encoder *encoder) { struct drm_device *drm = encoder->dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); struct vc4_txp *txp = encoder_to_vc4_txp(encoder); int idx; @@ -380,7 +405,8 @@ static void vc4_txp_encoder_disable(struct drm_encoder *encoder) WARN_ON(TXP_READ(TXP_DST_CTRL) & TXP_BUSY); } - TXP_WRITE(TXP_DST_CTRL, TXP_POWERDOWN); + if (vc4->gen < VC4_GEN_6_C) + TXP_WRITE(TXP_DST_CTRL, TXP_POWERDOWN); drm_dev_exit(idx); } @@ -484,17 +510,49 @@ static irqreturn_t vc4_txp_interrupt(int irq, void *data) return IRQ_HANDLED; } -const struct vc4_crtc_data vc4_txp_crtc_data = { - .name = "txp", - .debugfs_name = "txp_regs", - .hvs_available_channels = BIT(2), - .hvs_output = 2, +static const struct vc4_txp_data bcm2712_mop_data = { + .base = { + .name = "mop", + .debugfs_name = "mop_regs", + .hvs_available_channels = BIT(2), + .hvs_output = 2, + }, + .encoder_type = VC4_ENCODER_TYPE_TXP0, + .high_addr_ptr_reg = TXP_DST_PTR_HIGH_MOP, + .has_byte_enable = true, + .size_minus_one = true, + .supports_40bit_addresses = true, +}; + +static const struct vc4_txp_data bcm2712_moplet_data = { + .base = { + .name = "moplet", + .debugfs_name = "moplet_regs", + .hvs_available_channels = BIT(1), + .hvs_output = 4, + }, + .encoder_type = VC4_ENCODER_TYPE_TXP1, + .high_addr_ptr_reg = TXP_DST_PTR_HIGH_MOPLET, + .size_minus_one = true, + .supports_40bit_addresses = true, +}; + +const struct vc4_txp_data bcm2835_txp_data = { + .base = { + .name = "txp", + .debugfs_name = "txp_regs", + .hvs_available_channels = BIT(2), + .hvs_output = 2, + }, + .encoder_type = VC4_ENCODER_TYPE_TXP0, + .has_byte_enable = true, }; static int vc4_txp_bind(struct device *dev, struct device *master, void *data) { struct platform_device *pdev = to_platform_device(dev); struct drm_device *drm = dev_get_drvdata(master); + const struct vc4_txp_data *txp_data; struct vc4_encoder *vc4_encoder; struct drm_encoder *encoder; struct vc4_crtc *vc4_crtc; @@ -509,6 +567,11 @@ static int vc4_txp_bind(struct device *dev, struct device *master, void *data) if (!txp) return -ENOMEM; + txp_data = of_device_get_match_data(dev); + if (!txp_data) + return -ENODEV; + + txp->data = txp_data; txp->pdev = pdev; txp->regs = vc4_ioremap_regs(pdev, 0); if (IS_ERR(txp->regs)) @@ -519,13 +582,13 @@ static int vc4_txp_bind(struct device *dev, struct device *master, void *data) vc4_crtc->regset.regs = txp_regs; vc4_crtc->regset.nregs = ARRAY_SIZE(txp_regs); - ret = vc4_crtc_init(drm, pdev, vc4_crtc, &vc4_txp_crtc_data, + ret = vc4_crtc_init(drm, pdev, vc4_crtc, &txp_data->base, &vc4_txp_crtc_funcs, &vc4_txp_crtc_helper_funcs, true); if (ret) return ret; vc4_encoder = &txp->encoder; - txp->encoder.type = VC4_ENCODER_TYPE_TXP; + txp->encoder.type = txp_data->encoder_type; encoder = &vc4_encoder->base; encoder->possible_crtcs = drm_crtc_mask(&vc4_crtc->base); @@ -579,7 +642,9 @@ static void vc4_txp_remove(struct platform_device *pdev) } static const struct of_device_id vc4_txp_dt_match[] = { - { .compatible = "brcm,bcm2835-txp" }, + { .compatible = "brcm,bcm2712-mop", .data = &bcm2712_mop_data }, + { .compatible = "brcm,bcm2712-moplet", .data = &bcm2712_moplet_data }, + { .compatible = "brcm,bcm2835-txp", .data = &bcm2835_txp_data }, { /* sentinel */ }, }; diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c index c5e3e5457737..2752ab4f1c97 100644 --- a/drivers/gpu/drm/vgem/vgem_drv.c +++ b/drivers/gpu/drm/vgem/vgem_drv.c @@ -47,7 +47,6 @@ #define DRIVER_NAME "vgem" #define DRIVER_DESC "Virtual GEM provider" -#define DRIVER_DATE "20120112" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -121,7 +120,6 @@ static const struct drm_driver vgem_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c index ffca6e2e1c9a..6a67c6297d58 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.c +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c @@ -32,9 +32,9 @@ #include <linux/poll.h> #include <linux/wait.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_file.h> @@ -184,7 +184,6 @@ static const struct drm_driver driver = { .postclose = virtio_gpu_driver_postclose, .dumb_create = virtio_gpu_mode_dumb_create, - .dumb_map_offset = virtio_gpu_mode_dumb_mmap, DRM_FBDEV_SHMEM_DRIVER_OPS, @@ -202,7 +201,6 @@ static const struct drm_driver driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h index 64c236169db8..f42ca9d8ed10 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.h +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h @@ -45,7 +45,6 @@ #define DRIVER_NAME "virtio_gpu" #define DRIVER_DESC "virtio GPU" -#define DRIVER_DATE "0" #define DRIVER_MAJOR 0 #define DRIVER_MINOR 1 @@ -89,9 +88,11 @@ struct virtio_gpu_object_params { struct virtio_gpu_object { struct drm_gem_shmem_object base; + struct sg_table *sgt; uint32_t hw_res_handle; bool dumb; bool created; + bool attached; bool host3d_blob, guest_blob; uint32_t blob_mem, blob_flags; @@ -194,6 +195,13 @@ struct virtio_gpu_framebuffer { #define to_virtio_gpu_framebuffer(x) \ container_of(x, struct virtio_gpu_framebuffer, base) +struct virtio_gpu_plane_state { + struct drm_plane_state base; + struct virtio_gpu_fence *fence; +}; +#define to_virtio_gpu_plane_state(x) \ + container_of(x, struct virtio_gpu_plane_state, base) + struct virtio_gpu_queue { struct virtqueue *vq; spinlock_t qlock; @@ -301,9 +309,6 @@ void virtio_gpu_gem_object_close(struct drm_gem_object *obj, int virtio_gpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, struct drm_mode_create_dumb *args); -int virtio_gpu_mode_dumb_mmap(struct drm_file *file_priv, - struct drm_device *dev, - uint32_t handle, uint64_t *offset_p); struct virtio_gpu_object_array *virtio_gpu_array_alloc(u32 nents); struct virtio_gpu_object_array* @@ -349,6 +354,10 @@ void virtio_gpu_object_attach(struct virtio_gpu_device *vgdev, struct virtio_gpu_object *obj, struct virtio_gpu_mem_entry *ents, unsigned int nents); +void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev, + struct virtio_gpu_object *obj, + struct virtio_gpu_fence *fence); +int virtio_gpu_detach_object_fenced(struct virtio_gpu_object *bo); void virtio_gpu_cursor_ping(struct virtio_gpu_device *vgdev, struct virtio_gpu_output *output); int virtio_gpu_cmd_get_display_info(struct virtio_gpu_device *vgdev); @@ -468,6 +477,10 @@ struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev, struct drm_gem_object *virtgpu_gem_prime_import_sg_table( struct drm_device *dev, struct dma_buf_attachment *attach, struct sg_table *sgt); +int virtgpu_dma_buf_import_sgt(struct virtio_gpu_mem_entry **ents, + unsigned int *nents, + struct virtio_gpu_object *bo, + struct dma_buf_attachment *attach); /* virtgpu_debugfs.c */ void virtio_gpu_debugfs_init(struct drm_minor *minor); diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index 7db48d17ee3a..5aab588fc400 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -99,21 +99,6 @@ fail: return ret; } -int virtio_gpu_mode_dumb_mmap(struct drm_file *file_priv, - struct drm_device *dev, - uint32_t handle, uint64_t *offset_p) -{ - struct drm_gem_object *gobj; - - BUG_ON(!offset_p); - gobj = drm_gem_object_lookup(file_priv, handle); - if (gobj == NULL) - return -ENOENT; - *offset_p = drm_vma_node_offset_addr(&gobj->vma_node); - drm_gem_object_put(gobj); - return 0; -} - int virtio_gpu_gem_object_open(struct drm_gem_object *obj, struct drm_file *file) { @@ -127,15 +112,17 @@ int virtio_gpu_gem_object_open(struct drm_gem_object *obj, /* the context might still be missing when the first ioctl is * DRM_IOCTL_MODE_CREATE_DUMB or DRM_IOCTL_PRIME_FD_TO_HANDLE */ - virtio_gpu_create_context(obj->dev, file); + if (!vgdev->has_context_init) + virtio_gpu_create_context(obj->dev, file); objs = virtio_gpu_array_alloc(1); if (!objs) return -ENOMEM; virtio_gpu_array_add_obj(objs, obj); - virtio_gpu_cmd_context_attach_resource(vgdev, vfpriv->ctx_id, - objs); + if (vfpriv->ctx_id) + virtio_gpu_cmd_context_attach_resource(vgdev, vfpriv->ctx_id, objs); + out_notify: virtio_gpu_notify(vgdev); return 0; diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c index e4f76f315550..c33c057365f8 100644 --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c @@ -80,9 +80,9 @@ static int virtio_gpu_map_ioctl(struct drm_device *dev, void *data, struct virtio_gpu_device *vgdev = dev->dev_private; struct drm_virtgpu_map *virtio_gpu_map = data; - return virtio_gpu_mode_dumb_mmap(file, vgdev->ddev, - virtio_gpu_map->handle, - &virtio_gpu_map->offset); + return drm_gem_dumb_map_offset(file, vgdev->ddev, + virtio_gpu_map->handle, + &virtio_gpu_map->offset); } static int virtio_gpu_getparam_ioctl(struct drm_device *dev, void *data, diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c index c7e74cf13022..5517cff8715c 100644 --- a/drivers/gpu/drm/virtio/virtgpu_object.c +++ b/drivers/gpu/drm/virtio/virtgpu_object.c @@ -80,6 +80,9 @@ void virtio_gpu_cleanup_object(struct virtio_gpu_object *bo) drm_gem_free_mmap_offset(&vram->base.base.base); drm_gem_object_release(&vram->base.base.base); kfree(vram); + } else { + drm_gem_object_release(&bo->base.base); + kfree(bo); } } @@ -97,6 +100,27 @@ static void virtio_gpu_free_object(struct drm_gem_object *obj) virtio_gpu_cleanup_object(bo); } +int virtio_gpu_detach_object_fenced(struct virtio_gpu_object *bo) +{ + struct virtio_gpu_device *vgdev = bo->base.base.dev->dev_private; + struct virtio_gpu_fence *fence; + + if (!bo->attached) + return 0; + + fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0); + if (!fence) + return -ENOMEM; + + virtio_gpu_object_detach(vgdev, bo, fence); + virtio_gpu_notify(vgdev); + + dma_fence_wait(&fence->f, false); + dma_fence_put(&fence->f); + + return 0; +} + static const struct drm_gem_object_funcs virtio_gpu_shmem_funcs = { .free = virtio_gpu_free_object, .open = virtio_gpu_gem_object_open, diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index a72a2dbda031..42aa554eca9f 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -26,6 +26,8 @@ #include <drm/drm_atomic_helper.h> #include <drm/drm_damage_helper.h> #include <drm/drm_fourcc.h> +#include <drm/drm_gem_atomic_helper.h> +#include <linux/virtio_dma_buf.h> #include "virtgpu_drv.h" @@ -66,11 +68,28 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc) return format; } +static struct +drm_plane_state *virtio_gpu_plane_duplicate_state(struct drm_plane *plane) +{ + struct virtio_gpu_plane_state *new; + + if (WARN_ON(!plane->state)) + return NULL; + + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return NULL; + + __drm_atomic_helper_plane_duplicate_state(plane, &new->base); + + return &new->base; +} + static const struct drm_plane_funcs virtio_gpu_plane_funcs = { .update_plane = drm_atomic_helper_update_plane, .disable_plane = drm_atomic_helper_disable_plane, .reset = drm_atomic_helper_plane_reset, - .atomic_duplicate_state = drm_atomic_helper_plane_duplicate_state, + .atomic_duplicate_state = virtio_gpu_plane_duplicate_state, .atomic_destroy_state = drm_atomic_helper_plane_destroy_state, }; @@ -138,11 +157,13 @@ static void virtio_gpu_resource_flush(struct drm_plane *plane, struct drm_device *dev = plane->dev; struct virtio_gpu_device *vgdev = dev->dev_private; struct virtio_gpu_framebuffer *vgfb; + struct virtio_gpu_plane_state *vgplane_st; struct virtio_gpu_object *bo; vgfb = to_virtio_gpu_framebuffer(plane->state->fb); + vgplane_st = to_virtio_gpu_plane_state(plane->state); bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); - if (vgfb->fence) { + if (vgplane_st->fence) { struct virtio_gpu_object_array *objs; objs = virtio_gpu_array_alloc(1); @@ -151,13 +172,11 @@ static void virtio_gpu_resource_flush(struct drm_plane *plane, virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]); virtio_gpu_array_lock_resv(objs); virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y, - width, height, objs, vgfb->fence); + width, height, objs, + vgplane_st->fence); virtio_gpu_notify(vgdev); - - dma_fence_wait_timeout(&vgfb->fence->f, true, + dma_fence_wait_timeout(&vgplane_st->fence->f, true, msecs_to_jiffies(50)); - dma_fence_put(&vgfb->fence->f); - vgfb->fence = NULL; } else { virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y, width, height, NULL, NULL); @@ -241,45 +260,113 @@ static void virtio_gpu_primary_plane_update(struct drm_plane *plane, rect.y2 - rect.y1); } +static int virtio_gpu_prepare_imported_obj(struct drm_plane *plane, + struct drm_plane_state *new_state, + struct drm_gem_object *obj) +{ + struct virtio_gpu_device *vgdev = plane->dev->dev_private; + struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj); + struct dma_buf_attachment *attach = obj->import_attach; + struct dma_resv *resv = attach->dmabuf->resv; + struct virtio_gpu_mem_entry *ents = NULL; + unsigned int nents; + int ret; + + dma_resv_lock(resv, NULL); + + ret = dma_buf_pin(attach); + if (ret) { + dma_resv_unlock(resv); + return ret; + } + + if (!bo->sgt) { + ret = virtgpu_dma_buf_import_sgt(&ents, &nents, + bo, attach); + if (ret) + goto err; + + virtio_gpu_object_attach(vgdev, bo, ents, nents); + } + + dma_resv_unlock(resv); + return 0; + +err: + dma_buf_unpin(attach); + dma_resv_unlock(resv); + return ret; +} + static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane, struct drm_plane_state *new_state) { struct drm_device *dev = plane->dev; struct virtio_gpu_device *vgdev = dev->dev_private; struct virtio_gpu_framebuffer *vgfb; + struct virtio_gpu_plane_state *vgplane_st; struct virtio_gpu_object *bo; + struct drm_gem_object *obj; + int ret; if (!new_state->fb) return 0; vgfb = to_virtio_gpu_framebuffer(new_state->fb); + vgplane_st = to_virtio_gpu_plane_state(new_state); bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); + + drm_gem_plane_helper_prepare_fb(plane, new_state); + if (!bo || (plane->type == DRM_PLANE_TYPE_PRIMARY && !bo->guest_blob)) return 0; - if (bo->dumb && (plane->state->fb != new_state->fb)) { - vgfb->fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, + obj = new_state->fb->obj[0]; + if (obj->import_attach) { + ret = virtio_gpu_prepare_imported_obj(plane, new_state, obj); + if (ret) + return ret; + } + + if (bo->dumb || obj->import_attach) { + vgplane_st->fence = virtio_gpu_fence_alloc(vgdev, + vgdev->fence_drv.context, 0); - if (!vgfb->fence) + if (!vgplane_st->fence) return -ENOMEM; } return 0; } +static void virtio_gpu_cleanup_imported_obj(struct drm_gem_object *obj) +{ + struct dma_buf_attachment *attach = obj->import_attach; + struct dma_resv *resv = attach->dmabuf->resv; + + dma_resv_lock(resv, NULL); + dma_buf_unpin(attach); + dma_resv_unlock(resv); +} + static void virtio_gpu_plane_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *state) { - struct virtio_gpu_framebuffer *vgfb; + struct virtio_gpu_plane_state *vgplane_st; + struct drm_gem_object *obj; if (!state->fb) return; - vgfb = to_virtio_gpu_framebuffer(state->fb); - if (vgfb->fence) { - dma_fence_put(&vgfb->fence->f); - vgfb->fence = NULL; + vgplane_st = to_virtio_gpu_plane_state(state); + if (vgplane_st->fence) { + dma_fence_put(&vgplane_st->fence->f); + vgplane_st->fence = NULL; } + + obj = state->fb->obj[0]; + if (obj->import_attach) + virtio_gpu_cleanup_imported_obj(obj); } static void virtio_gpu_cursor_plane_update(struct drm_plane *plane, @@ -291,6 +378,7 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane, struct virtio_gpu_device *vgdev = dev->dev_private; struct virtio_gpu_output *output = NULL; struct virtio_gpu_framebuffer *vgfb; + struct virtio_gpu_plane_state *vgplane_st; struct virtio_gpu_object *bo = NULL; uint32_t handle; @@ -303,6 +391,7 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane, if (plane->state->fb) { vgfb = to_virtio_gpu_framebuffer(plane->state->fb); + vgplane_st = to_virtio_gpu_plane_state(plane->state); bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]); handle = bo->hw_res_handle; } else { @@ -322,11 +411,9 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane, (vgdev, 0, plane->state->crtc_w, plane->state->crtc_h, - 0, 0, objs, vgfb->fence); + 0, 0, objs, vgplane_st->fence); virtio_gpu_notify(vgdev); - dma_fence_wait(&vgfb->fence->f, true); - dma_fence_put(&vgfb->fence->f); - vgfb->fence = NULL; + dma_fence_wait(&vgplane_st->fence->f, true); } if (plane->state->fb != old_state->fb) { diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c b/drivers/gpu/drm/virtio/virtgpu_prime.c index 44425f20d91a..b3664c12843d 100644 --- a/drivers/gpu/drm/virtio/virtgpu_prime.c +++ b/drivers/gpu/drm/virtio/virtgpu_prime.c @@ -27,6 +27,8 @@ #include "virtgpu_drv.h" +MODULE_IMPORT_NS("DMA_BUF"); + static int virtgpu_virtio_get_uuid(struct dma_buf *buf, uuid_t *uuid) { @@ -142,10 +144,159 @@ struct dma_buf *virtgpu_gem_prime_export(struct drm_gem_object *obj, return buf; } +int virtgpu_dma_buf_import_sgt(struct virtio_gpu_mem_entry **ents, + unsigned int *nents, + struct virtio_gpu_object *bo, + struct dma_buf_attachment *attach) +{ + struct scatterlist *sl; + struct sg_table *sgt; + long i, ret; + + dma_resv_assert_held(attach->dmabuf->resv); + + ret = dma_resv_wait_timeout(attach->dmabuf->resv, + DMA_RESV_USAGE_KERNEL, + false, MAX_SCHEDULE_TIMEOUT); + if (ret <= 0) + return ret < 0 ? ret : -ETIMEDOUT; + + sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL); + if (IS_ERR(sgt)) + return PTR_ERR(sgt); + + *ents = kvmalloc_array(sgt->nents, + sizeof(struct virtio_gpu_mem_entry), + GFP_KERNEL); + if (!(*ents)) { + dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL); + return -ENOMEM; + } + + *nents = sgt->nents; + for_each_sgtable_dma_sg(sgt, sl, i) { + (*ents)[i].addr = cpu_to_le64(sg_dma_address(sl)); + (*ents)[i].length = cpu_to_le32(sg_dma_len(sl)); + (*ents)[i].padding = 0; + } + + bo->sgt = sgt; + return 0; +} + +static void virtgpu_dma_buf_free_obj(struct drm_gem_object *obj) +{ + struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj); + struct virtio_gpu_device *vgdev = obj->dev->dev_private; + struct dma_buf_attachment *attach = obj->import_attach; + struct dma_resv *resv = attach->dmabuf->resv; + + if (attach) { + dma_resv_lock(resv, NULL); + + virtio_gpu_detach_object_fenced(bo); + + if (bo->sgt) + dma_buf_unmap_attachment(attach, bo->sgt, + DMA_BIDIRECTIONAL); + + dma_resv_unlock(resv); + + dma_buf_detach(attach->dmabuf, attach); + dma_buf_put(attach->dmabuf); + } + + if (bo->created) { + virtio_gpu_cmd_unref_resource(vgdev, bo); + virtio_gpu_notify(vgdev); + return; + } + virtio_gpu_cleanup_object(bo); +} + +static int virtgpu_dma_buf_init_obj(struct drm_device *dev, + struct virtio_gpu_object *bo, + struct dma_buf_attachment *attach) +{ + struct virtio_gpu_device *vgdev = dev->dev_private; + struct virtio_gpu_object_params params = { 0 }; + struct dma_resv *resv = attach->dmabuf->resv; + struct virtio_gpu_mem_entry *ents = NULL; + unsigned int nents; + int ret; + + ret = virtio_gpu_resource_id_get(vgdev, &bo->hw_res_handle); + if (ret) { + virtgpu_dma_buf_free_obj(&bo->base.base); + return ret; + } + + dma_resv_lock(resv, NULL); + + ret = dma_buf_pin(attach); + if (ret) + goto err_pin; + + ret = virtgpu_dma_buf_import_sgt(&ents, &nents, bo, attach); + if (ret) + goto err_import; + + params.blob = true; + params.blob_mem = VIRTGPU_BLOB_MEM_GUEST; + params.blob_flags = VIRTGPU_BLOB_FLAG_USE_SHAREABLE; + params.size = attach->dmabuf->size; + + virtio_gpu_cmd_resource_create_blob(vgdev, bo, ¶ms, + ents, nents); + bo->guest_blob = true; + bo->attached = true; + + dma_buf_unpin(attach); + dma_resv_unlock(resv); + + return 0; + +err_import: + dma_buf_unpin(attach); +err_pin: + dma_resv_unlock(resv); + virtgpu_dma_buf_free_obj(&bo->base.base); + return ret; +} + +static const struct drm_gem_object_funcs virtgpu_gem_dma_buf_funcs = { + .free = virtgpu_dma_buf_free_obj, +}; + +static void virtgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ + struct drm_gem_object *obj = attach->importer_priv; + struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj); + + if (bo->created && kref_read(&obj->refcount)) { + virtio_gpu_detach_object_fenced(bo); + + if (bo->sgt) + dma_buf_unmap_attachment(attach, bo->sgt, + DMA_BIDIRECTIONAL); + + bo->sgt = NULL; + } +} + +static const struct dma_buf_attach_ops virtgpu_dma_buf_attach_ops = { + .allow_peer2peer = true, + .move_notify = virtgpu_dma_buf_move_notify +}; + struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev, struct dma_buf *buf) { + struct virtio_gpu_device *vgdev = dev->dev_private; + struct dma_buf_attachment *attach; + struct virtio_gpu_object *bo; struct drm_gem_object *obj; + int ret; if (buf->ops == &virtgpu_dmabuf_ops.ops) { obj = buf->priv; @@ -159,7 +310,32 @@ struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev, } } - return drm_gem_prime_import(dev, buf); + if (!vgdev->has_resource_blob || vgdev->has_virgl_3d) + return drm_gem_prime_import(dev, buf); + + bo = kzalloc(sizeof(*bo), GFP_KERNEL); + if (!bo) + return ERR_PTR(-ENOMEM); + + obj = &bo->base.base; + obj->funcs = &virtgpu_gem_dma_buf_funcs; + drm_gem_private_object_init(dev, obj, buf->size); + + attach = dma_buf_dynamic_attach(buf, dev->dev, + &virtgpu_dma_buf_attach_ops, obj); + if (IS_ERR(attach)) { + kfree(bo); + return ERR_CAST(attach); + } + + obj->import_attach = attach; + get_dma_buf(buf); + + ret = virtgpu_dma_buf_init_obj(dev, bo, attach); + if (ret < 0) + return ERR_PTR(ret); + + return obj; } struct drm_gem_object *virtgpu_gem_prime_import_sg_table( diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index 0d3d0d09f39b..ad91624df42d 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -645,6 +645,23 @@ virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev, virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, fence); } +static void +virtio_gpu_cmd_resource_detach_backing(struct virtio_gpu_device *vgdev, + uint32_t resource_id, + struct virtio_gpu_fence *fence) +{ + struct virtio_gpu_resource_detach_backing *cmd_p; + struct virtio_gpu_vbuffer *vbuf; + + cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p)); + memset(cmd_p, 0, sizeof(*cmd_p)); + + cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING); + cmd_p->resource_id = cpu_to_le32(resource_id); + + virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, fence); +} + static void virtio_gpu_cmd_get_display_info_cb(struct virtio_gpu_device *vgdev, struct virtio_gpu_vbuffer *vbuf) { @@ -1103,8 +1120,26 @@ void virtio_gpu_object_attach(struct virtio_gpu_device *vgdev, struct virtio_gpu_mem_entry *ents, unsigned int nents) { + if (obj->attached) + return; + virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle, ents, nents, NULL); + + obj->attached = true; +} + +void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev, + struct virtio_gpu_object *obj, + struct virtio_gpu_fence *fence) +{ + if (!obj->attached) + return; + + virtio_gpu_cmd_resource_detach_backing(vgdev, obj->hw_res_handle, + fence); + + obj->attached = false; } void virtio_gpu_cursor_ping(struct virtio_gpu_device *vgdev, diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index 3f0977d746be..b20ac1705726 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -24,64 +24,33 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha) /** * pre_mul_alpha_blend - alpha blending equation - * @frame_info: Source framebuffer's metadata * @stage_buffer: The line with the pixels from src_plane * @output_buffer: A line buffer that receives all the blends output + * @x_start: The start offset + * @pixel_count: The number of pixels to blend * - * Using the information from the `frame_info`, this blends only the - * necessary pixels from the `stage_buffer` to the `output_buffer` - * using premultiplied blend formula. + * The pixels [@x_start;@x_start+@pixel_count) in stage_buffer are blended at + * [@x_start;@x_start+@pixel_count) in output_buffer. * * The current DRM assumption is that pixel color values have been already * pre-multiplied with the alpha channel values. See more * drm_plane_create_blend_mode_property(). Also, this formula assumes a * completely opaque background. */ -static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info, - struct line_buffer *stage_buffer, - struct line_buffer *output_buffer) +static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer, + struct line_buffer *output_buffer, int x_start, int pixel_count) { - int x_dst = frame_info->dst.x1; - struct pixel_argb_u16 *out = output_buffer->pixels + x_dst; - struct pixel_argb_u16 *in = stage_buffer->pixels; - int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), - stage_buffer->n_pixels); - - for (int x = 0; x < x_limit; x++) { - out[x].a = (u16)0xffff; - out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a); - out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a); - out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a); + struct pixel_argb_u16 *out = &output_buffer->pixels[x_start]; + const struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start]; + + for (int i = 0; i < pixel_count; i++) { + out[i].a = (u16)0xffff; + out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a); + out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a); + out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a); } } -static int get_y_pos(struct vkms_frame_info *frame_info, int y) -{ - if (frame_info->rotation & DRM_MODE_REFLECT_Y) - return drm_rect_height(&frame_info->rotated) - y - 1; - - switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) { - case DRM_MODE_ROTATE_90: - return frame_info->rotated.x2 - y - 1; - case DRM_MODE_ROTATE_270: - return y + frame_info->rotated.x1; - default: - return y; - } -} - -static bool check_limit(struct vkms_frame_info *frame_info, int pos) -{ - if (drm_rotation_90_or_270(frame_info->rotation)) { - if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated)) - return true; - } else { - if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2) - return true; - } - - return false; -} static void fill_background(const struct pixel_argb_u16 *background_color, struct line_buffer *output_buffer) @@ -96,7 +65,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t) s64 a_fp = drm_int2fixp(a); s64 b_fp = drm_int2fixp(b); - s64 delta = drm_fixp_mul(b_fp - a_fp, t); + s64 delta = drm_fixp_mul(b_fp - a_fp, t); return drm_fixp2int(a_fp + delta); } @@ -164,6 +133,226 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff } /** + * direction_for_rotation() - Get the correct reading direction for a given rotation + * + * @rotation: Rotation to analyze. It correspond the field @frame_info.rotation. + * + * This function will use the @rotation setting of a source plane to compute the reading + * direction in this plane which correspond to a "left to right writing" in the CRTC. + * For example, if the buffer is reflected on X axis, the pixel must be read from right to left + * to be written from left to right on the CRTC. + */ +static enum pixel_read_direction direction_for_rotation(unsigned int rotation) +{ + struct drm_rect tmp_a, tmp_b; + int x, y; + + /* + * Points A and B are depicted as zero-size rectangles on the CRTC. + * The CRTC writing direction is from A to B. The plane reading direction + * is discovered by inverse-transforming A and B. + * The reading direction is computed by rotating the vector AB (top-left to top-right) in a + * 1x1 square. + */ + + tmp_a = DRM_RECT_INIT(0, 0, 0, 0); + tmp_b = DRM_RECT_INIT(1, 0, 0, 0); + drm_rect_rotate_inv(&tmp_a, 1, 1, rotation); + drm_rect_rotate_inv(&tmp_b, 1, 1, rotation); + + x = tmp_b.x1 - tmp_a.x1; + y = tmp_b.y1 - tmp_a.y1; + + if (x == 1 && y == 0) + return READ_LEFT_TO_RIGHT; + else if (x == -1 && y == 0) + return READ_RIGHT_TO_LEFT; + else if (y == 1 && x == 0) + return READ_TOP_TO_BOTTOM; + else if (y == -1 && x == 0) + return READ_BOTTOM_TO_TOP; + + WARN_ONCE(true, "The inverse of the rotation gives an incorrect direction."); + return READ_LEFT_TO_RIGHT; +} + +/** + * clamp_line_coordinates() - Compute and clamp the coordinate to read and write during the blend + * process. + * + * @direction: direction of the reading + * @current_plane: current plane blended + * @src_line: source line of the reading. Only the top-left coordinate is used. This rectangle + * must be rotated and have a shape of 1*pixel_count if @direction is vertical and a shape of + * pixel_count*1 if @direction is horizontal. + * @src_x_start: x start coordinate for the line reading + * @src_y_start: y start coordinate for the line reading + * @dst_x_start: x coordinate to blend the read line + * @pixel_count: number of pixels to blend + * + * This function is mainly a safety net to avoid reading outside the source buffer. As the + * userspace should never ask to read outside the source plane, all the cases covered here should + * be dead code. + */ +static void clamp_line_coordinates(enum pixel_read_direction direction, + const struct vkms_plane_state *current_plane, + const struct drm_rect *src_line, int *src_x_start, + int *src_y_start, int *dst_x_start, int *pixel_count) +{ + /* By default the start points are correct */ + *src_x_start = src_line->x1; + *src_y_start = src_line->y1; + *dst_x_start = current_plane->frame_info->dst.x1; + + /* Get the correct number of pixel to blend, it depends of the direction */ + switch (direction) { + case READ_LEFT_TO_RIGHT: + case READ_RIGHT_TO_LEFT: + *pixel_count = drm_rect_width(src_line); + break; + case READ_BOTTOM_TO_TOP: + case READ_TOP_TO_BOTTOM: + *pixel_count = drm_rect_height(src_line); + break; + } + + /* + * Clamp the coordinates to avoid reading outside the buffer + * + * This is mainly a security check to avoid reading outside the buffer, the userspace + * should never request to read outside the source buffer. + */ + switch (direction) { + case READ_LEFT_TO_RIGHT: + case READ_RIGHT_TO_LEFT: + if (*src_x_start < 0) { + *pixel_count += *src_x_start; + *dst_x_start -= *src_x_start; + *src_x_start = 0; + } + if (*src_x_start + *pixel_count > current_plane->frame_info->fb->width) + *pixel_count = max(0, (int)current_plane->frame_info->fb->width - + *src_x_start); + break; + case READ_BOTTOM_TO_TOP: + case READ_TOP_TO_BOTTOM: + if (*src_y_start < 0) { + *pixel_count += *src_y_start; + *dst_x_start -= *src_y_start; + *src_y_start = 0; + } + if (*src_y_start + *pixel_count > current_plane->frame_info->fb->height) + *pixel_count = max(0, (int)current_plane->frame_info->fb->height - + *src_y_start); + break; + } +} + +/** + * blend_line() - Blend a line from a plane to the output buffer + * + * @current_plane: current plane to work on + * @y: line to write in the output buffer + * @crtc_x_limit: width of the output buffer + * @stage_buffer: temporary buffer to convert the pixel line from the source buffer + * @output_buffer: buffer to blend the read line into. + */ +static void blend_line(struct vkms_plane_state *current_plane, int y, + int crtc_x_limit, struct line_buffer *stage_buffer, + struct line_buffer *output_buffer) +{ + int src_x_start, src_y_start, dst_x_start, pixel_count; + struct drm_rect dst_line, tmp_src, src_line; + + /* Avoid rendering useless lines */ + if (y < current_plane->frame_info->dst.y1 || + y >= current_plane->frame_info->dst.y2) + return; + + /* + * dst_line is the line to copy. The initial coordinates are inside the + * destination framebuffer, and then drm_rect_* helpers are used to + * compute the correct position into the source framebuffer. + */ + dst_line = DRM_RECT_INIT(current_plane->frame_info->dst.x1, y, + drm_rect_width(¤t_plane->frame_info->dst), + 1); + + drm_rect_fp_to_int(&tmp_src, ¤t_plane->frame_info->src); + + /* + * [1]: Clamping src_line to the crtc_x_limit to avoid writing outside of + * the destination buffer + */ + dst_line.x1 = max_t(int, dst_line.x1, 0); + dst_line.x2 = min_t(int, dst_line.x2, crtc_x_limit); + /* The destination is completely outside of the crtc. */ + if (dst_line.x2 <= dst_line.x1) + return; + + src_line = dst_line; + + /* + * Transform the coordinate x/y from the crtc to coordinates into + * coordinates for the src buffer. + * + * - Cancel the offset of the dst buffer. + * - Invert the rotation. This assumes that + * dst = drm_rect_rotate(src, rotation) (dst and src have the + * same size, but can be rotated). + * - Apply the offset of the source rectangle to the coordinate. + */ + drm_rect_translate(&src_line, -current_plane->frame_info->dst.x1, + -current_plane->frame_info->dst.y1); + drm_rect_rotate_inv(&src_line, drm_rect_width(&tmp_src), + drm_rect_height(&tmp_src), + current_plane->frame_info->rotation); + drm_rect_translate(&src_line, tmp_src.x1, tmp_src.y1); + + /* Get the correct reading direction in the source buffer. */ + + enum pixel_read_direction direction = + direction_for_rotation(current_plane->frame_info->rotation); + + /* [2]: Compute and clamp the number of pixel to read */ + clamp_line_coordinates(direction, current_plane, &src_line, &src_x_start, &src_y_start, + &dst_x_start, &pixel_count); + + if (pixel_count <= 0) { + /* Nothing to read, so avoid multiple function calls */ + return; + } + + /* + * Modify the starting point to take in account the rotation + * + * src_line is the top-left corner, so when reading READ_RIGHT_TO_LEFT or + * READ_BOTTOM_TO_TOP, it must be changed to the top-right/bottom-left + * corner. + */ + if (direction == READ_RIGHT_TO_LEFT) { + // src_x_start is now the right point + src_x_start += pixel_count - 1; + } else if (direction == READ_BOTTOM_TO_TOP) { + // src_y_start is now the bottom point + src_y_start += pixel_count - 1; + } + + /* + * Perform the conversion and the blending + * + * Here we know that the read line (x_start, y_start, pixel_count) is + * inside the source buffer [2] and we don't write outside the stage + * buffer [1]. + */ + current_plane->pixel_read_line(current_plane, src_x_start, src_y_start, direction, + pixel_count, &stage_buffer->pixels[dst_x_start]); + + pre_mul_alpha_blend(stage_buffer, output_buffer, + dst_x_start, pixel_count); +} + +/** * blend - blend the pixels from all planes and compute crc * @wb: The writeback frame buffer metadata * @crtc_state: The crtc state @@ -183,32 +372,25 @@ static void blend(struct vkms_writeback_job *wb, { struct vkms_plane_state **plane = crtc_state->active_planes; u32 n_active_planes = crtc_state->num_active_planes; - int y_pos; const struct pixel_argb_u16 background_color = { .a = 0xffff }; - size_t crtc_y_limit = crtc_state->base.mode.vdisplay; + int crtc_y_limit = crtc_state->base.mode.vdisplay; + int crtc_x_limit = crtc_state->base.mode.hdisplay; /* * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary * complexity to avoid poor blending performance. * - * The function vkms_compose_row() is used to read a line, pixel-by-pixel, into the staging - * buffer. + * The function pixel_read_line callback is used to read a line, using an efficient + * algorithm for a specific format, into the staging buffer. */ - for (size_t y = 0; y < crtc_y_limit; y++) { + for (int y = 0; y < crtc_y_limit; y++) { fill_background(&background_color, output_buffer); /* The active planes are composed associatively in z-order. */ for (size_t i = 0; i < n_active_planes; i++) { - y_pos = get_y_pos(plane[i]->frame_info, y); - - if (!check_limit(plane[i]->frame_info, y_pos)) - continue; - - vkms_compose_row(stage_buffer, plane[i], y_pos); - pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer, - output_buffer); + blend_line(plane[i], y, crtc_x_limit, stage_buffer, output_buffer); } apply_lut(crtc_state, output_buffer); @@ -216,7 +398,7 @@ static void blend(struct vkms_writeback_job *wb, *crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size); if (wb) - vkms_writeback_row(wb, output_buffer, y_pos); + vkms_writeback_row(wb, output_buffer, y); } } @@ -227,7 +409,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state, u32 n_active_planes = crtc_state->num_active_planes; for (size_t i = 0; i < n_active_planes; i++) - if (!planes[i]->pixel_read) + if (!planes[i]->pixel_read_line) return -1; if (active_wb && !active_wb->pixel_write) @@ -309,8 +491,8 @@ free_stage_buffer: void vkms_composer_worker(struct work_struct *work) { struct vkms_crtc_state *crtc_state = container_of(work, - struct vkms_crtc_state, - composer_work); + struct vkms_crtc_state, + composer_work); struct drm_crtc *crtc = crtc_state->base.crtc; struct vkms_writeback_job *active_wb = crtc_state->active_writeback; struct vkms_output *out = drm_crtc_to_vkms_output(crtc); @@ -335,7 +517,7 @@ void vkms_composer_worker(struct work_struct *work) crtc_state->gamma_lut.base = (struct drm_color_lut *)crtc->state->gamma_lut->data; crtc_state->gamma_lut.lut_length = crtc->state->gamma_lut->length / sizeof(struct drm_color_lut); - max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length - 1); + max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length - 1); crtc_state->gamma_lut.channel_value2index_ratio = drm_fixp_div(max_lut_index_fp, u16_max_fp); @@ -374,7 +556,7 @@ void vkms_composer_worker(struct work_struct *work) drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32); } -static const char * const pipe_crc_sources[] = {"auto"}; +static const char *const pipe_crc_sources[] = { "auto" }; const char *const *vkms_get_crc_sources(struct drm_crtc *crtc, size_t *count) diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index bbf080d32d2c..28a57ae109fc 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -186,8 +186,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc, return ret; drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) { - plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, - plane); + plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane); WARN_ON(!plane_state); if (!plane_state->visible) @@ -203,8 +202,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc, i = 0; drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) { - plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, - plane); + plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane); if (!plane_state->visible) continue; diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c index 2d1e95cb66e5..e0409aba9349 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.c +++ b/drivers/gpu/drm/vkms/vkms_drv.c @@ -13,10 +13,10 @@ #include <linux/platform_device.h> #include <linux/dma-mapping.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_gem.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> -#include <drm/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_file.h> @@ -34,7 +34,6 @@ #define DRIVER_NAME "vkms" #define DRIVER_DESC "Virtual Kernel Mode Setting" -#define DRIVER_DATE "20180514" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 @@ -82,8 +81,7 @@ static void vkms_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_wait_for_flip_done(dev, old_state); for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) { - struct vkms_crtc_state *vkms_state = - to_vkms_crtc_state(old_crtc_state); + struct vkms_crtc_state *vkms_state = to_vkms_crtc_state(old_crtc_state); flush_work(&vkms_state->composer_work); } @@ -117,7 +115,6 @@ static const struct drm_driver vkms_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; @@ -174,7 +171,7 @@ static int vkms_modeset_init(struct vkms_device *vkmsdev) dev->mode_config.preferred_depth = 0; dev->mode_config.helper_private = &vkms_mode_config_helpers; - return vkms_output_init(vkmsdev, 0); + return vkms_output_init(vkmsdev); } static int vkms_create(struct vkms_config *config) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 672fe191e239..00541eff3d1b 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -39,12 +39,8 @@ struct vkms_frame_info { struct drm_framebuffer *fb; struct drm_rect src, dst; - struct drm_rect rotated; struct iosys_map map[DRM_FORMAT_MAX_PLANES]; unsigned int rotation; - unsigned int offset; - unsigned int pitch; - unsigned int cpp; }; struct pixel_argb_u16 { @@ -56,23 +52,65 @@ struct line_buffer { struct pixel_argb_u16 *pixels; }; +/** + * typedef pixel_write_t - These functions are used to read a pixel from a + * &struct pixel_argb_u16, convert it in a specific format and write it in the @out_pixel + * buffer. + * + * @out_pixel: destination address to write the pixel + * @in_pixel: pixel to write + */ +typedef void (*pixel_write_t)(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel); + struct vkms_writeback_job { struct iosys_map data[DRM_FORMAT_MAX_PLANES]; struct vkms_frame_info wb_frame_info; - void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel); + pixel_write_t pixel_write; }; /** + * enum pixel_read_direction - Enum used internally by VKMS to represent a reading direction in a + * plane. + */ +enum pixel_read_direction { + READ_BOTTOM_TO_TOP, + READ_TOP_TO_BOTTOM, + READ_RIGHT_TO_LEFT, + READ_LEFT_TO_RIGHT +}; + +struct vkms_plane_state; + +/** + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame, + * convert it to `struct pixel_argb_u16` and write it to @out_pixel. + * + * @plane: plane used as source for the pixel value + * @x_start: X (width) coordinate of the first pixel to copy. The caller must ensure that x_start + * is non-negative and smaller than @plane->frame_info->fb->width. + * @y_start: Y (height) coordinate of the first pixel to copy. The caller must ensure that y_start + * is non-negative and smaller than @plane->frame_info->fb->height. + * @direction: direction to use for the copy, starting at @x_start/@y_start + * @count: number of pixels to copy + * @out_pixel: pointer where to write the pixel values. They will be written from @out_pixel[0] + * (included) to @out_pixel[@count] (excluded). The caller must ensure that out_pixel have a + * length of at least @count. + */ +typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]); + +/** * struct vkms_plane_state - Driver specific plane state * @base: base plane state * @frame_info: data required for composing computation - * @pixel_read: function to read a pixel in this plane. The creator of a struct vkms_plane_state - * must ensure that this pointer is valid + * @pixel_read_line: function to read a pixel line in this plane. The creator of a + * struct vkms_plane_state must ensure that this pointer is valid */ struct vkms_plane_state { struct drm_shadow_plane_state base; struct vkms_frame_info *frame_info; - void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel); + pixel_read_line_t pixel_read_line; }; struct vkms_plane { @@ -212,21 +250,17 @@ int vkms_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, * vkms_output_init() - Initialize all sub-components needed for a VKMS device. * * @vkmsdev: VKMS device to initialize - * @index: CRTC which can be attached to the planes. The caller must ensure that - * @index is positive and less or equals to 31. */ -int vkms_output_init(struct vkms_device *vkmsdev, int index); +int vkms_output_init(struct vkms_device *vkmsdev); /** * vkms_plane_init() - Initialize a plane * * @vkmsdev: VKMS device containing the plane * @type: type of plane to initialize - * @index: CRTC which can be attached to the plane. The caller must ensure that - * @index is positive and less or equals to 31. */ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev, - enum drm_plane_type type, int index); + enum drm_plane_type type); /* CRC Support */ const char *const *vkms_get_crc_sources(struct drm_crtc *crtc, @@ -238,7 +272,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name, /* Composer Support */ void vkms_composer_worker(struct work_struct *work); void vkms_set_composer(struct vkms_output *out, bool enabled); -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y); void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y); /* Writeback */ diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c index e8a5cc235ebb..39b1d7c97d45 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -10,21 +10,46 @@ #include "vkms_formats.h" /** - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y * * @frame_info: Buffer metadata * @x: The x coordinate of the wanted pixel in the buffer * @y: The y coordinate of the wanted pixel in the buffer + * @plane_index: The index of the plane to use + * @offset: The returned offset inside the buffer of the block + * @rem_x: The returned X coordinate of the requested pixel in the block + * @rem_y: The returned Y coordinate of the requested pixel in the block * - * The caller must ensure that the framebuffer associated with this request uses a pixel format - * where block_h == block_w == 1. - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or - * outside of the buffer. + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some + * pixels are not individually addressable. This function return 3 values: the offset of the + * whole block, and the coordinate of the requested pixel inside this block. + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y + * coordinates will be (13 % 8, 5 % 1) = (5, 0) + * + * With this function, the caller just have to extract the correct pixel from the block. */ -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y) +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y, + int plane_index, int *offset, int *rem_x, int *rem_y) { - return frame_info->offset + (y * frame_info->pitch) - + (x * frame_info->cpp); + struct drm_framebuffer *fb = frame_info->fb; + const struct drm_format_info *format = frame_info->fb->format; + /* Directly using x and y to multiply pitches and format->ccp is not sufficient because + * in some formats a block can represent multiple pixels. + * + * Dividing x and y by the block size allows to extract the correct offset of the block + * containing the pixel. + */ + + int block_x = x / drm_format_info_block_width(format, plane_index); + int block_y = y / drm_format_info_block_height(format, plane_index); + int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format, + plane_index); + *rem_x = x % drm_format_info_block_width(format, plane_index); + *rem_y = y % drm_format_info_block_height(format, plane_index); + *offset = fb->offsets[plane_index] + + block_y * block_pitch + + block_x * format->char_per_block[plane_index]; } /** @@ -34,145 +59,266 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int * @frame_info: Buffer metadata * @x: The x (width) coordinate inside the plane * @y: The y (height) coordinate inside the plane + * @plane_index: The index of the plane + * @addr: The returned pointer + * @rem_x: The returned X coordinate of the requested pixel in the block + * @rem_y: The returned Y coordinate of the requested pixel in the block * - * Takes the information stored in the frame_info, a pair of coordinates, and - * returns the address of the first color channel. - * This function assumes the channels are packed together, i.e. a color channel - * comes immediately after another in the memory. And therefore, this function - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21). + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address + * of the block containing this pixel and the pixel position inside this block. * - * The caller must ensure that the framebuffer associated with this request uses a pixel format - * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer. + * See @packed_pixels_offset for details about rem_x/rem_y behavior. */ -static void *packed_pixels_addr(const struct vkms_frame_info *frame_info, - int x, int y) +static void packed_pixels_addr(const struct vkms_frame_info *frame_info, + int x, int y, int plane_index, u8 **addr, int *rem_x, + int *rem_y) { - size_t offset = pixel_offset(frame_info, x, y); + int offset; - return (u8 *)frame_info->map[0].vaddr + offset; + packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y); + *addr = (u8 *)frame_info->map[0].vaddr + offset; } -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y) +/** + * get_block_step_bytes() - Common helper to compute the correct step value between each pixel block + * to read in a certain direction. + * + * @fb: Framebuffer to iter on + * @direction: Direction of the reading + * @plane_index: Plane to get the step from + * + * As the returned count is the number of bytes between two consecutive blocks in a direction, + * the caller may have to read multiple pixels before using the next one (for example, to read from + * left to right in a DRM_FORMAT_R1 plane, each block contains 8 pixels, so the step must be used + * only every 8 pixels). + */ +static int get_block_step_bytes(struct drm_framebuffer *fb, enum pixel_read_direction direction, + int plane_index) { - int x_src = frame_info->src.x1 >> 16; - int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16); + switch (direction) { + case READ_LEFT_TO_RIGHT: + return fb->format->char_per_block[plane_index]; + case READ_RIGHT_TO_LEFT: + return -fb->format->char_per_block[plane_index]; + case READ_TOP_TO_BOTTOM: + return (int)fb->pitches[plane_index] * drm_format_info_block_width(fb->format, + plane_index); + case READ_BOTTOM_TO_TOP: + return -(int)fb->pitches[plane_index] * drm_format_info_block_width(fb->format, + plane_index); + } - return packed_pixels_addr(frame_info, x_src, y_src); + return 0; } -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x) +/** + * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given + * coordinates + * + * @frame_info: Buffer metadata + * @x: The x (width) coordinate inside the plane + * @y: The y (height) coordinate inside the plane + * @plane_index: The index of the plane + * @addr: The returned pointer + * + * This function can only be used with format where block_h == block_w == 1. + */ +static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info, + int x, int y, int plane_index, u8 **addr) { - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270)) - return limit - x - 1; - return x; + int offset, rem_x, rem_y; + + WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, + plane_index) != 1, + "%s() only support formats with block_w == 1", __func__); + WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, + plane_index) != 1, + "%s() only support formats with block_h == 1", __func__); + + packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x, + &rem_y); + *addr = (u8 *)frame_info->map[0].vaddr + offset; } /* - * The following functions take pixel data from the buffer and convert them to the format - * ARGB16161616 in @out_pixel. + * The following functions take pixel data (a, r, g, b, pixel, ...) and convert them to + * &struct pixel_argb_u16 * - * They are used in the vkms_compose_row() function to handle multiple formats. + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats. */ -static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_u8888(u8 a, u8 r, u8 g, u8 b) { + struct pixel_argb_u16 out_pixel; /* * The 257 is the "conversion ratio". This number is obtained by the * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get * the best color value in a pixel format with more possibilities. * A similar idea applies to others RGB color conversions. */ - out_pixel->a = (u16)src_pixels[3] * 257; - out_pixel->r = (u16)src_pixels[2] * 257; - out_pixel->g = (u16)src_pixels[1] * 257; - out_pixel->b = (u16)src_pixels[0] * 257; -} + out_pixel.a = (u16)a * 257; + out_pixel.r = (u16)r * 257; + out_pixel.g = (u16)g * 257; + out_pixel.b = (u16)b * 257; -static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) -{ - out_pixel->a = (u16)0xffff; - out_pixel->r = (u16)src_pixels[2] * 257; - out_pixel->g = (u16)src_pixels[1] * 257; - out_pixel->b = (u16)src_pixels[0] * 257; + return out_pixel; } -static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_u16161616(u16 a, u16 r, u16 g, u16 b) { - __le16 *pixels = (__force __le16 *)src_pixels; + struct pixel_argb_u16 out_pixel; - out_pixel->a = le16_to_cpu(pixels[3]); - out_pixel->r = le16_to_cpu(pixels[2]); - out_pixel->g = le16_to_cpu(pixels[1]); - out_pixel->b = le16_to_cpu(pixels[0]); + out_pixel.a = a; + out_pixel.r = r; + out_pixel.g = g; + out_pixel.b = b; + + return out_pixel; } -static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_le16161616(__le16 a, __le16 r, __le16 g, __le16 b) { - __le16 *pixels = (__force __le16 *)src_pixels; - - out_pixel->a = (u16)0xffff; - out_pixel->r = le16_to_cpu(pixels[2]); - out_pixel->g = le16_to_cpu(pixels[1]); - out_pixel->b = le16_to_cpu(pixels[0]); + return argb_u16_from_u16161616(le16_to_cpu(a), le16_to_cpu(r), le16_to_cpu(g), + le16_to_cpu(b)); } -static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_RGB565(const __le16 *pixel) { - __le16 *pixels = (__force __le16 *)src_pixels; + struct pixel_argb_u16 out_pixel; s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31)); s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63)); - u16 rgb_565 = le16_to_cpu(*pixels); + u16 rgb_565 = le16_to_cpu(*pixel); s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f); s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f); s64 fp_b = drm_int2fixp(rgb_565 & 0x1f); - out_pixel->a = (u16)0xffff; - out_pixel->r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio)); - out_pixel->g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio)); - out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); + out_pixel.a = (u16)0xffff; + out_pixel.r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio)); + out_pixel.g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio)); + out_pixel.b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); + + return out_pixel; } -/** - * vkms_compose_row - compose a single row of a plane - * @stage_buffer: output line with the composed pixels - * @plane: state of the plane that is being composed - * @y: y coordinate of the row +/* + * The following functions are read_line function for each pixel format supported by VKMS. + * + * They read a line starting at the point @x_start,@y_start following the @direction. The result + * is stored in @out_pixel and in the format ARGB16161616. + * + * These functions are very repetitive, but the innermost pixel loops must be kept inside these + * functions for performance reasons. Some benchmarking was done in [1] where having the innermost + * loop factored out of these functions showed a slowdown by a factor of three. * - * This function composes a single row of a plane. It gets the source pixels - * through the y coordinate (see get_packed_src_addr()) and goes linearly - * through the source pixel, reading the pixels and converting it to - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270, - * the source pixels are not traversed linearly. The source pixels are queried - * on each iteration in order to traverse the pixels vertically. + * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/ */ -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y) + +static void ARGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start, + enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) { - struct pixel_argb_u16 *out_pixels = stage_buffer->pixels; - struct vkms_frame_info *frame_info = plane->frame_info; - u8 *src_pixels = get_packed_src_addr(frame_info, y); - int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels); + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; - for (size_t x = 0; x < limit; x++, src_pixels += frame_info->cpp) { - int x_pos = get_x_position(frame_info, limit, x); + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels); - if (drm_rotation_90_or_270(frame_info->rotation)) - src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1) - + frame_info->cpp * y; + int step = get_block_step_bytes(plane->frame_info->fb, direction, 0); - plane->pixel_read(src_pixels, &out_pixels[x_pos]); + while (out_pixel < end) { + u8 *px = (u8 *)src_pixels; + *out_pixel = argb_u16_from_u8888(px[3], px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void XRGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start, + enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels); + + int step = get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u8 *px = (u8 *)src_pixels; + *out_pixel = argb_u16_from_u8888(255, px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void ARGB16161616_read_line(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels); + + int step = get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u16 *px = (u16 *)src_pixels; + *out_pixel = argb_u16_from_u16161616(px[3], px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void XRGB16161616_read_line(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels); + + int step = get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + __le16 *px = (__le16 *)src_pixels; + *out_pixel = argb_u16_from_le16161616(cpu_to_le16(0xFFFF), px[2], px[1], px[0]); + out_pixel += 1; + src_pixels += step; + } +} + +static void RGB565_read_line(const struct vkms_plane_state *plane, int x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end = out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels); + + int step = get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + __le16 *px = (__le16 *)src_pixels; + + *out_pixel = argb_u16_from_RGB565(px); + out_pixel += 1; + src_pixels += step; } } /* * The following functions take one &struct pixel_argb_u16 and convert it to a specific format. - * The result is stored in @dst_pixels. + * The result is stored in @out_pixel. * * They are used in vkms_writeback_row() to convert and store a pixel from the src_buffer to * the writeback buffer. */ -static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_ARGB8888(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { /* * This sequence below is important because the format's byte order is @@ -184,43 +330,43 @@ static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel * | Addr + 2 | = Red channel * | Addr + 3 | = Alpha channel */ - dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixel->a, 257); - dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257); - dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257); - dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257); + out_pixel[3] = DIV_ROUND_CLOSEST(in_pixel->a, 257); + out_pixel[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257); + out_pixel[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257); + out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257); } -static void argb_u16_to_XRGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_XRGB8888(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { - dst_pixels[3] = 0xff; - dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257); - dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257); - dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257); + out_pixel[3] = 0xff; + out_pixel[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257); + out_pixel[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257); + out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257); } -static void argb_u16_to_ARGB16161616(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_ARGB16161616(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { - __le16 *pixels = (__force __le16 *)dst_pixels; + __le16 *pixel = (__le16 *)out_pixel; - pixels[3] = cpu_to_le16(in_pixel->a); - pixels[2] = cpu_to_le16(in_pixel->r); - pixels[1] = cpu_to_le16(in_pixel->g); - pixels[0] = cpu_to_le16(in_pixel->b); + pixel[3] = cpu_to_le16(in_pixel->a); + pixel[2] = cpu_to_le16(in_pixel->r); + pixel[1] = cpu_to_le16(in_pixel->g); + pixel[0] = cpu_to_le16(in_pixel->b); } -static void argb_u16_to_XRGB16161616(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_XRGB16161616(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { - __le16 *pixels = (__force __le16 *)dst_pixels; + __le16 *pixel = (__le16 *)out_pixel; - pixels[3] = cpu_to_le16(0xffff); - pixels[2] = cpu_to_le16(in_pixel->r); - pixels[1] = cpu_to_le16(in_pixel->g); - pixels[0] = cpu_to_le16(in_pixel->b); + pixel[3] = cpu_to_le16(0xffff); + pixel[2] = cpu_to_le16(in_pixel->r); + pixel[1] = cpu_to_le16(in_pixel->g); + pixel[0] = cpu_to_le16(in_pixel->b); } -static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel) +static void argb_u16_to_RGB565(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel) { - __le16 *pixels = (__force __le16 *)dst_pixels; + __le16 *pixel = (__le16 *)out_pixel; s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31)); s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63)); @@ -233,7 +379,7 @@ static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel) u16 g = drm_fixp2int(drm_fixp_div(fp_g, fp_g_ratio)); u16 b = drm_fixp2int(drm_fixp_div(fp_b, fp_rb_ratio)); - *pixels = cpu_to_le16(r << 11 | g << 5 | b); + *pixel = cpu_to_le16(r << 11 | g << 5 | b); } /** @@ -249,36 +395,47 @@ void vkms_writeback_row(struct vkms_writeback_job *wb, { struct vkms_frame_info *frame_info = &wb->wb_frame_info; int x_dst = frame_info->dst.x1; - u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y); + u8 *dst_pixels; + int rem_x, rem_y; + + packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y); struct pixel_argb_u16 *in_pixels = src_buffer->pixels; int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels); - for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->cpp) + for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->fb->format->cpp[0]) wb->pixel_write(dst_pixels, &in_pixels[x]); } /** - * get_pixel_conversion_function() - Retrieve the correct read_pixel function for a specific + * get_pixel_read_line_function() - Retrieve the correct read_line function for a specific * format. The returned pointer is NULL for unsupported pixel formats. The caller must ensure that * the pointer is valid before using it in a vkms_plane_state. * * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h]) */ -void *get_pixel_conversion_function(u32 format) +pixel_read_line_t get_pixel_read_line_function(u32 format) { switch (format) { case DRM_FORMAT_ARGB8888: - return &ARGB8888_to_argb_u16; + return &ARGB8888_read_line; case DRM_FORMAT_XRGB8888: - return &XRGB8888_to_argb_u16; + return &XRGB8888_read_line; case DRM_FORMAT_ARGB16161616: - return &ARGB16161616_to_argb_u16; + return &ARGB16161616_read_line; case DRM_FORMAT_XRGB16161616: - return &XRGB16161616_to_argb_u16; + return &XRGB16161616_read_line; case DRM_FORMAT_RGB565: - return &RGB565_to_argb_u16; + return &RGB565_read_line; default: - return NULL; + /* + * This is a bug in vkms_plane_atomic_check(). All the supported + * format must: + * - Be listed in vkms_formats in vkms_plane.c + * - Have a pixel_read callback defined here + */ + pr_err("Pixel format %p4cc is not supported by VKMS planes. This is a kernel bug, atomic check must forbid this configuration.\n", + &format); + BUG(); } } @@ -289,7 +446,7 @@ void *get_pixel_conversion_function(u32 format) * * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h]) */ -void *get_pixel_write_function(u32 format) +pixel_write_t get_pixel_write_function(u32 format) { switch (format) { case DRM_FORMAT_ARGB8888: @@ -303,6 +460,14 @@ void *get_pixel_write_function(u32 format) case DRM_FORMAT_RGB565: return &argb_u16_to_RGB565; default: - return NULL; + /* + * This is a bug in vkms_writeback_atomic_check. All the supported + * format must: + * - Be listed in vkms_wb_formats in vkms_writeback.c + * - Have a pixel_write callback defined here + */ + pr_err("Pixel format %p4cc is not supported by VKMS writeback. This is a kernel bug, atomic check must forbid this configuration.\n", + &format); + BUG(); } } diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h index cf59c2ed8e9a..8d2bef95ff79 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.h +++ b/drivers/gpu/drm/vkms/vkms_formats.h @@ -5,8 +5,8 @@ #include "vkms_drv.h" -void *get_pixel_conversion_function(u32 format); +pixel_read_line_t get_pixel_read_line_function(u32 format); -void *get_pixel_write_function(u32 format); +pixel_write_t get_pixel_write_function(u32 format); #endif /* _VKMS_FORMATS_H_ */ diff --git a/drivers/gpu/drm/vkms/vkms_output.c b/drivers/gpu/drm/vkms/vkms_output.c index 25a99fde126c..8f4bd5aef087 100644 --- a/drivers/gpu/drm/vkms/vkms_output.c +++ b/drivers/gpu/drm/vkms/vkms_output.c @@ -32,29 +32,14 @@ static const struct drm_connector_helper_funcs vkms_conn_helper_funcs = { .get_modes = vkms_conn_get_modes, }; -static int vkms_add_overlay_plane(struct vkms_device *vkmsdev, int index, - struct drm_crtc *crtc) -{ - struct vkms_plane *overlay; - - overlay = vkms_plane_init(vkmsdev, DRM_PLANE_TYPE_OVERLAY, index); - if (IS_ERR(overlay)) - return PTR_ERR(overlay); - - if (!overlay->base.possible_crtcs) - overlay->base.possible_crtcs = drm_crtc_mask(crtc); - - return 0; -} - -int vkms_output_init(struct vkms_device *vkmsdev, int index) +int vkms_output_init(struct vkms_device *vkmsdev) { struct vkms_output *output = &vkmsdev->output; struct drm_device *dev = &vkmsdev->drm; struct drm_connector *connector = &output->connector; struct drm_encoder *encoder = &output->encoder; struct drm_crtc *crtc = &output->crtc; - struct vkms_plane *primary, *cursor = NULL; + struct vkms_plane *primary, *overlay, *cursor = NULL; int ret; int writeback; unsigned int n; @@ -65,29 +50,31 @@ int vkms_output_init(struct vkms_device *vkmsdev, int index) * The overlay and cursor planes are not mandatory, but can be used to perform complex * composition. */ - primary = vkms_plane_init(vkmsdev, DRM_PLANE_TYPE_PRIMARY, index); + primary = vkms_plane_init(vkmsdev, DRM_PLANE_TYPE_PRIMARY); if (IS_ERR(primary)) return PTR_ERR(primary); - if (vkmsdev->config->overlay) { - for (n = 0; n < NUM_OVERLAY_PLANES; n++) { - ret = vkms_add_overlay_plane(vkmsdev, index, crtc); - if (ret) - return ret; - } - } - if (vkmsdev->config->cursor) { - cursor = vkms_plane_init(vkmsdev, DRM_PLANE_TYPE_CURSOR, index); + cursor = vkms_plane_init(vkmsdev, DRM_PLANE_TYPE_CURSOR); if (IS_ERR(cursor)) return PTR_ERR(cursor); } - /* [1]: Allocation of a CRTC, its index will be BIT(0) = 1 */ ret = vkms_crtc_init(dev, crtc, &primary->base, &cursor->base); if (ret) return ret; + if (vkmsdev->config->overlay) { + for (n = 0; n < NUM_OVERLAY_PLANES; n++) { + overlay = vkms_plane_init(vkmsdev, DRM_PLANE_TYPE_OVERLAY); + if (IS_ERR(overlay)) { + DRM_DEV_ERROR(dev->dev, "Failed to init vkms plane\n"); + return PTR_ERR(overlay); + } + overlay->base.possible_crtcs = drm_crtc_mask(crtc); + } + } + ret = drm_connector_init(dev, connector, &vkms_connector_funcs, DRM_MODE_CONNECTOR_VIRTUAL); if (ret) { @@ -103,11 +90,7 @@ int vkms_output_init(struct vkms_device *vkmsdev, int index) DRM_ERROR("Failed to init encoder\n"); goto err_encoder; } - /* - * This is a hardcoded value to select crtc for the encoder. - * BIT(0) here designate the first registered CRTC, the one allocated in [1] - */ - encoder->possible_crtcs = BIT(0); + encoder->possible_crtcs = drm_crtc_mask(crtc); ret = drm_connector_attach_encoder(connector, encoder); if (ret) { diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c index e5c625ab8e3e..e2fce471870f 100644 --- a/drivers/gpu/drm/vkms/vkms_plane.c +++ b/drivers/gpu/drm/vkms/vkms_plane.c @@ -112,23 +112,12 @@ static void vkms_plane_atomic_update(struct drm_plane *plane, frame_info = vkms_plane_state->frame_info; memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect)); memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect)); - memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect)); frame_info->fb = fb; memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map)); drm_framebuffer_get(frame_info->fb); - frame_info->rotation = drm_rotation_simplify(new_state->rotation, DRM_MODE_ROTATE_0 | - DRM_MODE_ROTATE_90 | - DRM_MODE_ROTATE_270 | - DRM_MODE_REFLECT_X | - DRM_MODE_REFLECT_Y); - - drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated), - drm_rect_height(&frame_info->rotated), frame_info->rotation); - - frame_info->offset = fb->offsets[0]; - frame_info->pitch = fb->pitches[0]; - frame_info->cpp = fb->format->cpp[0]; - vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt); + frame_info->rotation = new_state->rotation; + + vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt); } static int vkms_plane_atomic_check(struct drm_plane *plane, @@ -198,12 +187,12 @@ static const struct drm_plane_helper_funcs vkms_plane_helper_funcs = { }; struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev, - enum drm_plane_type type, int index) + enum drm_plane_type type) { struct drm_device *dev = &vkmsdev->drm; struct vkms_plane *plane; - plane = drmm_universal_plane_alloc(dev, struct vkms_plane, base, 1 << index, + plane = drmm_universal_plane_alloc(dev, struct vkms_plane, base, 0, &vkms_plane_funcs, vkms_formats, ARRAY_SIZE(vkms_formats), NULL, type, NULL); diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c index 999d5c01ea81..79918b44fedd 100644 --- a/drivers/gpu/drm/vkms/vkms_writeback.c +++ b/drivers/gpu/drm/vkms/vkms_writeback.c @@ -149,11 +149,6 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn, crtc_state->active_writeback = active_wb; crtc_state->wb_pending = true; spin_unlock_irq(&output->composer_lock); - - wb_frame_info->offset = fb->offsets[0]; - wb_frame_info->pitch = fb->pitches[0]; - wb_frame_info->cpp = fb->format->cpp[0]; - drm_writeback_queue_job(wb_conn, connector_state); active_wb->pixel_write = get_pixel_write_function(wb_format); drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 2c46897876dd..1699236fca5a 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -35,7 +35,7 @@ #include "vmwgfx_vkms.h" #include "ttm_object.h" -#include <drm/drm_client_setup.h> +#include <drm/clients/drm_client_setup.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_ttm.h> #include <drm/drm_gem_ttm_helper.h> @@ -1634,7 +1634,6 @@ static const struct drm_driver driver = { .fops = &vmwgfx_driver_fops, .name = VMWGFX_DRIVER_NAME, .desc = VMWGFX_DRIVER_DESC, - .date = VMWGFX_DRIVER_DATE, .major = VMWGFX_DRIVER_MAJOR, .minor = VMWGFX_DRIVER_MINOR, .patchlevel = VMWGFX_DRIVER_PATCHLEVEL diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index b21831ef214a..5275ef632d4b 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -57,7 +57,6 @@ #define VMWGFX_DRIVER_NAME "vmwgfx" -#define VMWGFX_DRIVER_DATE "20211206" #define VMWGFX_DRIVER_MAJOR 2 #define VMWGFX_DRIVER_MINOR 20 #define VMWGFX_DRIVER_PATCHLEVEL 0 diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c index 39949e0a493f..f0b429525467 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c @@ -479,7 +479,6 @@ static int vmw_ldu_init(struct vmw_private *dev_priv, unsigned unit) } drm_connector_helper_add(connector, &vmw_ldu_connector_helper_funcs); - connector->status = vmw_du_connector_detect(connector, true); ret = drm_encoder_init(dev, encoder, &vmw_legacy_encoder_funcs, DRM_MODE_ENCODER_VIRTUAL, NULL); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c index 0f4bfd98480a..32029d80b72b 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c @@ -868,7 +868,6 @@ static int vmw_sou_init(struct vmw_private *dev_priv, unsigned unit) } drm_connector_helper_add(connector, &vmw_sou_connector_helper_funcs); - connector->status = vmw_du_connector_detect(connector, true); ret = drm_encoder_init(dev, encoder, &vmw_screen_object_encoder_funcs, DRM_MODE_ENCODER_VIRTUAL, NULL); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c index 82d18b88f4a7..114a75069e1c 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c @@ -1593,7 +1593,6 @@ static int vmw_stdu_init(struct vmw_private *dev_priv, unsigned unit) } drm_connector_helper_add(connector, &vmw_stdu_connector_helper_funcs); - connector->status = vmw_du_connector_detect(connector, false); ret = drm_encoder_init(dev, encoder, &vmw_stdu_encoder_funcs, DRM_MODE_ENCODER_VIRTUAL, NULL); diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index bc7a04ce69fd..7730e0596299 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -101,6 +101,7 @@ xe-y += xe_bb.o \ xe_trace.o \ xe_trace_bo.o \ xe_trace_guc.o \ + xe_trace_lrc.o \ xe_ttm_sys_mgr.o \ xe_ttm_stolen_mgr.o \ xe_ttm_vram_mgr.o \ @@ -110,6 +111,7 @@ xe-y += xe_bb.o \ xe_vm.o \ xe_vram.o \ xe_vram_freq.o \ + xe_vsec.o \ xe_wait_user_fence.o \ xe_wa.o \ xe_wopcm.o @@ -124,7 +126,8 @@ xe-y += \ xe_gt_sriov_vf.o \ xe_guc_relay.o \ xe_memirq.o \ - xe_sriov.o + xe_sriov.o \ + xe_sriov_vf.o xe-$(CONFIG_PCI_IOV) += \ xe_gt_sriov_pf.o \ diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h index b54fe40fc5a9..fee385532fb0 100644 --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h @@ -134,6 +134,8 @@ enum xe_guc_action { XE_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, XE_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, XE_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, + XE_GUC_ACTION_REGISTER_G2G = 0x4507, + XE_GUC_ACTION_DEREGISTER_G2G = 0x4508, XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601, XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507, @@ -218,4 +220,22 @@ enum xe_guc_tlb_inval_mode { XE_GUC_TLB_INVAL_MODE_LITE = 0x1, }; +/* + * GuC to GuC communication (de-)registration fields: + */ +enum xe_guc_g2g_type { + XE_G2G_TYPE_IN = 0x0, + XE_G2G_TYPE_OUT, + XE_G2G_TYPE_LIMIT, +}; + +#define XE_G2G_REGISTER_DEVICE REG_GENMASK(16, 16) +#define XE_G2G_REGISTER_TILE REG_GENMASK(15, 12) +#define XE_G2G_REGISTER_TYPE REG_GENMASK(11, 8) +#define XE_G2G_REGISTER_SIZE REG_GENMASK(7, 0) + +#define XE_G2G_DEREGISTER_DEVICE REG_GENMASK(16, 16) +#define XE_G2G_DEREGISTER_TILE REG_GENMASK(15, 12) +#define XE_G2G_DEREGISTER_TYPE REG_GENMASK(11, 8) + #endif diff --git a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h index b6a1852749dd..0b28659d94e9 100644 --- a/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h @@ -502,6 +502,44 @@ #define VF2GUC_VF_RESET_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 /** + * DOC: VF2GUC_NOTIFY_RESFIX_DONE + * + * This action is used by VF to notify the GuC that the VF KMD has completed + * post-migration recovery steps. + * + * This message must be sent as `MMIO HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE` = 0x5508 | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = MBZ | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE 0x5508u + +#define VF2GUC_NOTIFY_RESFIX_DONE_REQUEST_MSG_LEN GUC_HXG_REQUEST_MSG_MIN_LEN +#define VF2GUC_NOTIFY_RESFIX_DONE_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 + +#define VF2GUC_NOTIFY_RESFIX_DONE_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN +#define VF2GUC_NOTIFY_RESFIX_DONE_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 + +/** * DOC: VF2GUC_QUERY_SINGLE_KLV * * This action is used by VF to query value of the single KLV data. diff --git a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h index 37606cf8cc5e..7dcb118e3d9f 100644 --- a/drivers/gpu/drm/xe/abi/guc_klvs_abi.h +++ b/drivers/gpu/drm/xe/abi/guc_klvs_abi.h @@ -291,6 +291,14 @@ enum { * * :0: (default) * :1-65535: number of contexts (Gen12) + * + * _`GUC_KLV_VF_CFG_SCHED_PRIORITY` : 0x8A0C + * This config controls VF’s scheduling priority. + * + * :0: LOW = schedule VF only if it has active work (default) + * :1: NORMAL = schedule VF always, irrespective of whether it has work or not + * :2: HIGH = schedule VF in the next time-slice after current active + * time-slice completes if it has active work */ #define GUC_KLV_VF_CFG_GGTT_START_KEY 0x0001 @@ -343,6 +351,12 @@ enum { #define GUC_KLV_VF_CFG_BEGIN_CONTEXT_ID_KEY 0x8a0b #define GUC_KLV_VF_CFG_BEGIN_CONTEXT_ID_LEN 1u +#define GUC_KLV_VF_CFG_SCHED_PRIORITY_KEY 0x8a0c +#define GUC_KLV_VF_CFG_SCHED_PRIORITY_LEN 1u +#define GUC_SCHED_PRIORITY_LOW 0u +#define GUC_SCHED_PRIORITY_NORMAL 1u +#define GUC_SCHED_PRIORITY_HIGH 2u + /* * Workaround keys: */ diff --git a/drivers/gpu/drm/xe/display/ext/i915_irq.c b/drivers/gpu/drm/xe/display/ext/i915_irq.c index a7dbc6554d69..ac4cda2d81c7 100644 --- a/drivers/gpu/drm/xe/display/ext/i915_irq.c +++ b/drivers/gpu/drm/xe/display/ext/i915_irq.c @@ -53,18 +53,7 @@ void gen2_irq_init(struct intel_uncore *uncore, struct i915_irq_regs regs, bool intel_irqs_enabled(struct xe_device *xe) { - /* - * XXX: i915 has a racy handling of the irq.enabled, since it doesn't - * lock its transitions. Because of that, the irq.enabled sometimes - * is not read with the irq.lock in place. - * However, the most critical cases like vblank and page flips are - * properly using the locks. - * We cannot take the lock in here or run any kind of assert because - * of i915 inconsistency. - * But at this point the xe irq is better protected against races, - * although the full solution would be protecting the i915 side. - */ - return xe->irq.enabled; + return atomic_read(&xe->irq.enabled); } void intel_synchronize_irq(struct xe_device *xe) diff --git a/drivers/gpu/drm/xe/display/intel_bo.c b/drivers/gpu/drm/xe/display/intel_bo.c index 9f54fad0f1c0..b463f5bd4eed 100644 --- a/drivers/gpu/drm/xe/display/intel_bo.c +++ b/drivers/gpu/drm/xe/display/intel_bo.c @@ -40,31 +40,8 @@ int intel_bo_fb_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) int intel_bo_read_from_page(struct drm_gem_object *obj, u64 offset, void *dst, int size) { struct xe_bo *bo = gem_to_xe_bo(obj); - struct ttm_bo_kmap_obj map; - void *src; - bool is_iomem; - int ret; - ret = xe_bo_lock(bo, true); - if (ret) - return ret; - - ret = ttm_bo_kmap(&bo->ttm, offset >> PAGE_SHIFT, 1, &map); - if (ret) - goto out_unlock; - - offset &= ~PAGE_MASK; - src = ttm_kmap_obj_virtual(&map, &is_iomem); - src += offset; - if (is_iomem) - memcpy_fromio(dst, (void __iomem *)src, size); - else - memcpy(dst, src, size); - - ttm_bo_kunmap(&map); -out_unlock: - xe_bo_unlock(bo); - return ret; + return xe_bo_read(bo, offset, dst, size); } struct intel_frontbuffer *intel_bo_get_frontbuffer(struct drm_gem_object *obj) diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c index 761510ae0690..9fa51b84737c 100644 --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c @@ -161,7 +161,7 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb, } vma->dpt = dpt; - vma->node = dpt->ggtt_node; + vma->node = dpt->ggtt_node[tile0->id]; return 0; } @@ -213,8 +213,8 @@ static int __xe_pin_fb_vma_ggtt(const struct intel_framebuffer *fb, if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K) align = max_t(u32, align, SZ_64K); - if (bo->ggtt_node && view->type == I915_GTT_VIEW_NORMAL) { - vma->node = bo->ggtt_node; + if (bo->ggtt_node[ggtt->tile->id] && view->type == I915_GTT_VIEW_NORMAL) { + vma->node = bo->ggtt_node[ggtt->tile->id]; } else if (view->type == I915_GTT_VIEW_NORMAL) { u32 x, size = bo->ttm.base.size; @@ -345,10 +345,12 @@ err: static void __xe_unpin_fb_vma(struct i915_vma *vma) { + u8 tile_id = vma->node->ggtt->tile->id; + if (vma->dpt) xe_bo_unpin_map_no_vm(vma->dpt); - else if (!xe_ggtt_node_allocated(vma->bo->ggtt_node) || - vma->bo->ggtt_node->base.start != vma->node->base.start) + else if (!xe_ggtt_node_allocated(vma->bo->ggtt_node[tile_id]) || + vma->bo->ggtt_node[tile_id]->base.start != vma->node->base.start) xe_ggtt_node_remove(vma->node, false); ttm_bo_reserve(&vma->bo->ttm, false, false, NULL); diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h index 0c9e4b2fafab..162f18e975da 100644 --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h @@ -445,6 +445,8 @@ #define SAMPLER_MODE XE_REG_MCR(0xe18c, XE_REG_OPTION_MASKED) #define ENABLE_SMALLPL REG_BIT(15) +#define SMP_WAIT_FETCH_MERGING_COUNTER REG_GENMASK(11, 10) +#define SMP_FORCE_128B_OVERFETCH REG_FIELD_PREP(SMP_WAIT_FETCH_MERGING_COUNTER, 1) #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) #define SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5) #define INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) diff --git a/drivers/gpu/drm/xe/regs/xe_oa_regs.h b/drivers/gpu/drm/xe/regs/xe_oa_regs.h index a9b0091cb7ee..a49561e9f3c3 100644 --- a/drivers/gpu/drm/xe/regs/xe_oa_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_oa_regs.h @@ -41,14 +41,6 @@ #define OAG_OABUFFER XE_REG(0xdb08) #define OABUFFER_SIZE_MASK REG_GENMASK(5, 3) -#define OABUFFER_SIZE_128K REG_FIELD_PREP(OABUFFER_SIZE_MASK, 0) -#define OABUFFER_SIZE_256K REG_FIELD_PREP(OABUFFER_SIZE_MASK, 1) -#define OABUFFER_SIZE_512K REG_FIELD_PREP(OABUFFER_SIZE_MASK, 2) -#define OABUFFER_SIZE_1M REG_FIELD_PREP(OABUFFER_SIZE_MASK, 3) -#define OABUFFER_SIZE_2M REG_FIELD_PREP(OABUFFER_SIZE_MASK, 4) -#define OABUFFER_SIZE_4M REG_FIELD_PREP(OABUFFER_SIZE_MASK, 5) -#define OABUFFER_SIZE_8M REG_FIELD_PREP(OABUFFER_SIZE_MASK, 6) -#define OABUFFER_SIZE_16M REG_FIELD_PREP(OABUFFER_SIZE_MASK, 7) #define OAG_OABUFFER_MEMORY_SELECT REG_BIT(0) /* 0: PPGTT, 1: GGTT */ #define OAG_OACONTROL XE_REG(0xdaf4) @@ -63,6 +55,7 @@ #define OAG_OA_DEBUG XE_REG(0xdaf8, XE_REG_OPTION_MASKED) #define OAG_OA_DEBUG_DISABLE_MMIO_TRG REG_BIT(14) #define OAG_OA_DEBUG_START_TRIGGER_SCOPE_CONTROL REG_BIT(13) +#define OAG_OA_DEBUG_BUF_SIZE_SELECT REG_BIT(12) #define OAG_OA_DEBUG_DISABLE_START_TRG_2_COUNT_QUAL REG_BIT(8) #define OAG_OA_DEBUG_DISABLE_START_TRG_1_COUNT_QUAL REG_BIT(7) #define OAG_OA_DEBUG_INCLUDE_CLK_RATIO REG_BIT(6) diff --git a/drivers/gpu/drm/xe/regs/xe_pmt.h b/drivers/gpu/drm/xe/regs/xe_pmt.h new file mode 100644 index 000000000000..f45abcd96ba8 --- /dev/null +++ b/drivers/gpu/drm/xe/regs/xe_pmt.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ +#ifndef _XE_PMT_H_ +#define _XE_PMT_H_ + +#define SOC_BASE 0x280000 + +#define BMG_PMT_BASE_OFFSET 0xDB000 +#define BMG_DISCOVERY_OFFSET (SOC_BASE + BMG_PMT_BASE_OFFSET) + +#define BMG_TELEMETRY_BASE_OFFSET 0xE0000 +#define BMG_TELEMETRY_OFFSET (SOC_BASE + BMG_TELEMETRY_BASE_OFFSET) + +#define SG_REMAP_INDEX1 XE_REG(SOC_BASE + 0x08) +#define SG_REMAP_BITS REG_GENMASK(31, 24) + +#endif diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c index 3e0ae40ebbd2..c9ec7a313c6b 100644 --- a/drivers/gpu/drm/xe/tests/xe_bo.c +++ b/drivers/gpu/drm/xe/tests/xe_bo.c @@ -49,6 +49,13 @@ static int ccs_test_migrate(struct xe_tile *tile, struct xe_bo *bo, KUNIT_FAIL(test, "Failed to submit bo clear.\n"); return PTR_ERR(fence); } + + if (dma_fence_wait_timeout(fence, false, 5 * HZ) <= 0) { + dma_fence_put(fence); + KUNIT_FAIL(test, "Timeout while clearing bo.\n"); + return -ETIME; + } + dma_fence_put(fence); } diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c index 3bbdb362d6f0..d5fe0ea889ad 100644 --- a/drivers/gpu/drm/xe/tests/xe_migrate.c +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c @@ -83,7 +83,8 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo, bo->size, ttm_bo_type_kernel, region | - XE_BO_FLAG_NEEDS_CPU_ACCESS); + XE_BO_FLAG_NEEDS_CPU_ACCESS | + XE_BO_FLAG_PINNED); if (IS_ERR(remote)) { KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n", str, remote); @@ -642,7 +643,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, DRM_XE_GEM_CPU_CACHING_WC, - XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS); + XE_BO_FLAG_SYSTEM | + XE_BO_FLAG_NEEDS_CPU_ACCESS | + XE_BO_FLAG_PINNED); if (IS_ERR(sys_bo)) { KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", @@ -666,7 +669,8 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til ccs_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, DRM_XE_GEM_CPU_CACHING_WC, - bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS); + bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | + XE_BO_FLAG_PINNED); if (IS_ERR(ccs_bo)) { KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", @@ -690,7 +694,8 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, DRM_XE_GEM_CPU_CACHING_WC, - bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS); + bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS | + XE_BO_FLAG_PINNED); if (IS_ERR(vram_bo)) { KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", PTR_ERR(vram_bo)); diff --git a/drivers/gpu/drm/xe/xe_assert.h b/drivers/gpu/drm/xe/xe_assert.h index 04d6b95c6d87..68fe70ce2be3 100644 --- a/drivers/gpu/drm/xe/xe_assert.h +++ b/drivers/gpu/drm/xe/xe_assert.h @@ -14,7 +14,7 @@ #include "xe_step.h" /** - * DOC: Xe ASSERTs + * DOC: Xe Asserts * * While Xe driver aims to be simpler than legacy i915 driver it is still * complex enough that some changes introduced while adding new functionality @@ -103,7 +103,7 @@ * (&CONFIG_DRM_XE_DEBUG must be enabled) and cannot be used in expressions * or as a condition. * - * See `Xe ASSERTs`_ for general usage guidelines. + * See `Xe Asserts`_ for general usage guidelines. */ #define xe_assert(xe, condition) xe_assert_msg((xe), condition, "") #define xe_assert_msg(xe, condition, msg, arg...) ({ \ @@ -138,7 +138,7 @@ * (&CONFIG_DRM_XE_DEBUG must be enabled) and cannot be used in expressions * or as a condition. * - * See `Xe ASSERTs`_ for general usage guidelines. + * See `Xe Asserts`_ for general usage guidelines. */ #define xe_tile_assert(tile, condition) xe_tile_assert_msg((tile), condition, "") #define xe_tile_assert_msg(tile, condition, msg, arg...) ({ \ @@ -162,7 +162,7 @@ * (&CONFIG_DRM_XE_DEBUG must be enabled) and cannot be used in expressions * or as a condition. * - * See `Xe ASSERTs`_ for general usage guidelines. + * See `Xe Asserts`_ for general usage guidelines. */ #define xe_gt_assert(gt, condition) xe_gt_assert_msg((gt), condition, "") #define xe_gt_assert_msg(gt, condition, msg, arg...) ({ \ diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index ae6b337cdc54..283cd0294570 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -162,6 +162,15 @@ static void try_add_system(struct xe_device *xe, struct xe_bo *bo, } } +static bool force_contiguous(u32 bo_flags) +{ + /* + * For eviction / restore on suspend / resume objects pinned in VRAM + * must be contiguous, also only contiguous BOs support xe_bo_vmap. + */ + return bo_flags & (XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT); +} + static void add_vram(struct xe_device *xe, struct xe_bo *bo, struct ttm_place *places, u32 bo_flags, u32 mem_type, u32 *c) { @@ -175,12 +184,7 @@ static void add_vram(struct xe_device *xe, struct xe_bo *bo, xe_assert(xe, vram && vram->usable_size); io_size = vram->io_size; - /* - * For eviction / restore on suspend / resume objects - * pinned in VRAM must be contiguous - */ - if (bo_flags & (XE_BO_FLAG_PINNED | - XE_BO_FLAG_GGTT)) + if (force_contiguous(bo_flags)) place.flags |= TTM_PL_FLAG_CONTIGUOUS; if (io_size < vram->usable_size) { @@ -212,8 +216,7 @@ static void try_add_stolen(struct xe_device *xe, struct xe_bo *bo, bo->placements[*c] = (struct ttm_place) { .mem_type = XE_PL_STOLEN, - .flags = bo_flags & (XE_BO_FLAG_PINNED | - XE_BO_FLAG_GGTT) ? + .flags = force_contiguous(bo_flags) ? TTM_PL_FLAG_CONTIGUOUS : 0, }; *c += 1; @@ -442,6 +445,14 @@ static void xe_ttm_tt_destroy(struct ttm_device *ttm_dev, struct ttm_tt *tt) kfree(tt); } +static bool xe_ttm_resource_visible(struct ttm_resource *mem) +{ + struct xe_ttm_vram_mgr_resource *vres = + to_xe_ttm_vram_mgr_resource(mem); + + return vres->used_visible_size == mem->size; +} + static int xe_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem) { @@ -453,11 +464,9 @@ static int xe_ttm_io_mem_reserve(struct ttm_device *bdev, return 0; case XE_PL_VRAM0: case XE_PL_VRAM1: { - struct xe_ttm_vram_mgr_resource *vres = - to_xe_ttm_vram_mgr_resource(mem); struct xe_mem_region *vram = res_to_mem_region(mem); - if (vres->used_visible_size < mem->size) + if (!xe_ttm_resource_visible(mem)) return -EINVAL; mem->bus.offset = mem->start << PAGE_SHIFT; @@ -876,6 +885,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo) }; struct ttm_operation_ctx ctx = { .interruptible = false, + .gfp_retry_mayfail = true, }; struct ttm_resource *new_mem; int ret; @@ -937,6 +947,7 @@ int xe_bo_restore_pinned(struct xe_bo *bo) { struct ttm_operation_ctx ctx = { .interruptible = false, + .gfp_retry_mayfail = false, }; struct ttm_resource *new_mem; struct ttm_place *place = &bo->placements[0]; @@ -1106,7 +1117,8 @@ static void xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operati static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo) { struct ttm_operation_ctx ctx = { - .interruptible = false + .interruptible = false, + .gfp_retry_mayfail = false, }; if (ttm_bo->ttm) { @@ -1118,6 +1130,52 @@ static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo) } } +static int xe_ttm_access_memory(struct ttm_buffer_object *ttm_bo, + unsigned long offset, void *buf, int len, + int write) +{ + struct xe_bo *bo = ttm_to_xe_bo(ttm_bo); + struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev); + struct iosys_map vmap; + struct xe_res_cursor cursor; + struct xe_mem_region *vram; + int bytes_left = len; + + xe_bo_assert_held(bo); + xe_device_assert_mem_access(xe); + + if (!mem_type_is_vram(ttm_bo->resource->mem_type)) + return -EIO; + + /* FIXME: Use GPU for non-visible VRAM */ + if (!xe_ttm_resource_visible(ttm_bo->resource)) + return -EIO; + + vram = res_to_mem_region(ttm_bo->resource); + xe_res_first(ttm_bo->resource, offset & PAGE_MASK, + bo->size - (offset & PAGE_MASK), &cursor); + + do { + unsigned long page_offset = (offset & ~PAGE_MASK); + int byte_count = min((int)(PAGE_SIZE - page_offset), bytes_left); + + iosys_map_set_vaddr_iomem(&vmap, (u8 __iomem *)vram->mapping + + cursor.start); + if (write) + xe_map_memcpy_to(xe, &vmap, page_offset, buf, byte_count); + else + xe_map_memcpy_from(xe, buf, &vmap, page_offset, byte_count); + + buf += byte_count; + offset += byte_count; + bytes_left -= byte_count; + if (bytes_left) + xe_res_next(&cursor, PAGE_SIZE); + } while (bytes_left); + + return len; +} + const struct ttm_device_funcs xe_ttm_funcs = { .ttm_tt_create = xe_ttm_tt_create, .ttm_tt_populate = xe_ttm_tt_populate, @@ -1127,6 +1185,7 @@ const struct ttm_device_funcs xe_ttm_funcs = { .move = xe_bo_move, .io_mem_reserve = xe_ttm_io_mem_reserve, .io_mem_pfn = xe_ttm_io_mem_pfn, + .access_memory = xe_ttm_access_memory, .release_notify = xe_ttm_bo_release_notify, .eviction_valuable = ttm_bo_eviction_valuable, .delete_mem_notify = xe_ttm_bo_delete_mem_notify, @@ -1137,6 +1196,8 @@ static void xe_ttm_bo_destroy(struct ttm_buffer_object *ttm_bo) { struct xe_bo *bo = ttm_to_xe_bo(ttm_bo); struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev); + struct xe_tile *tile; + u8 id; if (bo->ttm.base.import_attach) drm_prime_gem_destroy(&bo->ttm.base, NULL); @@ -1144,8 +1205,9 @@ static void xe_ttm_bo_destroy(struct ttm_buffer_object *ttm_bo) xe_assert(xe, list_empty(&ttm_bo->base.gpuva.list)); - if (bo->ggtt_node && bo->ggtt_node->base.size) - xe_ggtt_remove_bo(bo->tile->mem.ggtt, bo); + for_each_tile(tile, xe, id) + if (bo->ggtt_node[id] && bo->ggtt_node[id]->base.size) + xe_ggtt_remove_bo(tile->mem.ggtt, bo); #ifdef CONFIG_PROC_FS if (bo->client) @@ -1243,11 +1305,50 @@ out: return ret; } +static int xe_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, + void *buf, int len, int write) +{ + struct ttm_buffer_object *ttm_bo = vma->vm_private_data; + struct xe_bo *bo = ttm_to_xe_bo(ttm_bo); + struct xe_device *xe = xe_bo_device(bo); + int ret; + + xe_pm_runtime_get(xe); + ret = ttm_bo_vm_access(vma, addr, buf, len, write); + xe_pm_runtime_put(xe); + + return ret; +} + +/** + * xe_bo_read() - Read from an xe_bo + * @bo: The buffer object to read from. + * @offset: The byte offset to start reading from. + * @dst: Location to store the read. + * @size: Size in bytes for the read. + * + * Read @size bytes from the @bo, starting from @offset, storing into @dst. + * + * Return: Zero on success, or negative error. + */ +int xe_bo_read(struct xe_bo *bo, u64 offset, void *dst, int size) +{ + int ret; + + ret = ttm_bo_access(&bo->ttm, offset, dst, size, 0); + if (ret >= 0 && ret != size) + ret = -EIO; + else if (ret == size) + ret = 0; + + return ret; +} + static const struct vm_operations_struct xe_gem_vm_ops = { .fault = xe_gem_fault, .open = ttm_bo_vm_open, .close = ttm_bo_vm_close, - .access = ttm_bo_vm_access + .access = xe_bo_vm_access, }; static const struct drm_gem_object_funcs xe_gem_object_funcs = { @@ -1301,6 +1402,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, struct ttm_operation_ctx ctx = { .interruptible = true, .no_wait_gpu = false, + .gfp_retry_mayfail = true, }; struct ttm_placement *placement; uint32_t alignment; @@ -1315,6 +1417,10 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, return ERR_PTR(-EINVAL); } + /* XE_BO_FLAG_GGTTx requires XE_BO_FLAG_GGTT also be set */ + if ((flags & XE_BO_FLAG_GGTT_ALL) && !(flags & XE_BO_FLAG_GGTT)) + return ERR_PTR(-EINVAL); + if (flags & (XE_BO_FLAG_VRAM_MASK | XE_BO_FLAG_STOLEN) && !(flags & XE_BO_FLAG_IGNORE_MIN_PAGE_SIZE) && ((xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) || @@ -1505,19 +1611,29 @@ __xe_bo_create_locked(struct xe_device *xe, bo->vm = vm; if (bo->flags & XE_BO_FLAG_GGTT) { - if (!tile && flags & XE_BO_FLAG_STOLEN) - tile = xe_device_get_root_tile(xe); + struct xe_tile *t; + u8 id; - xe_assert(xe, tile); + if (!(bo->flags & XE_BO_FLAG_GGTT_ALL)) { + if (!tile && flags & XE_BO_FLAG_STOLEN) + tile = xe_device_get_root_tile(xe); - if (flags & XE_BO_FLAG_FIXED_PLACEMENT) { - err = xe_ggtt_insert_bo_at(tile->mem.ggtt, bo, - start + bo->size, U64_MAX); - } else { - err = xe_ggtt_insert_bo(tile->mem.ggtt, bo); + xe_assert(xe, tile); + } + + for_each_tile(t, xe, id) { + if (t != tile && !(bo->flags & XE_BO_FLAG_GGTTx(t))) + continue; + + if (flags & XE_BO_FLAG_FIXED_PLACEMENT) { + err = xe_ggtt_insert_bo_at(t->mem.ggtt, bo, + start + bo->size, U64_MAX); + } else { + err = xe_ggtt_insert_bo(t->mem.ggtt, bo); + } + if (err) + goto err_unlock_put_bo; } - if (err) - goto err_unlock_put_bo; } return bo; @@ -1900,6 +2016,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict) struct ttm_operation_ctx ctx = { .interruptible = true, .no_wait_gpu = false, + .gfp_retry_mayfail = true, }; if (vm) { @@ -1910,6 +2027,7 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict) ctx.resv = xe_vm_resv(vm); } + trace_xe_bo_validate(bo); return ttm_bo_validate(&bo->ttm, &bo->placement, &ctx); } @@ -1961,13 +2079,15 @@ dma_addr_t xe_bo_addr(struct xe_bo *bo, u64 offset, size_t page_size) int xe_bo_vmap(struct xe_bo *bo) { + struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev); void *virtual; bool is_iomem; int ret; xe_bo_assert_held(bo); - if (!(bo->flags & XE_BO_FLAG_NEEDS_CPU_ACCESS)) + if (drm_WARN_ON(&xe->drm, !(bo->flags & XE_BO_FLAG_NEEDS_CPU_ACCESS) || + !force_contiguous(bo->flags))) return -EINVAL; if (!iosys_map_is_null(&bo->vmap)) @@ -2243,6 +2363,7 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type) struct ttm_operation_ctx ctx = { .interruptible = true, .no_wait_gpu = false, + .gfp_retry_mayfail = true, }; struct ttm_placement placement; struct ttm_place requested; @@ -2293,6 +2414,7 @@ int xe_bo_evict(struct xe_bo *bo, bool force_alloc) .interruptible = false, .no_wait_gpu = false, .force_alloc = force_alloc, + .gfp_retry_mayfail = true, }; struct ttm_placement placement; int ret; @@ -2372,14 +2494,18 @@ void xe_bo_put_commit(struct llist_head *deferred) void xe_bo_put(struct xe_bo *bo) { + struct xe_tile *tile; + u8 id; + might_sleep(); if (bo) { #ifdef CONFIG_PROC_FS if (bo->client) might_lock(&bo->client->bos_lock); #endif - if (bo->ggtt_node && bo->ggtt_node->ggtt) - might_lock(&bo->ggtt_node->ggtt->lock); + for_each_tile(tile, xe_bo_device(bo), id) + if (bo->ggtt_node[id] && bo->ggtt_node[id]->ggtt) + might_lock(&bo->ggtt_node[id]->ggtt->lock); drm_gem_object_put(&bo->ttm.base); } } diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h index 7fa44a0138b0..d9386ab03140 100644 --- a/drivers/gpu/drm/xe/xe_bo.h +++ b/drivers/gpu/drm/xe/xe_bo.h @@ -39,10 +39,22 @@ #define XE_BO_FLAG_NEEDS_64K BIT(15) #define XE_BO_FLAG_NEEDS_2M BIT(16) #define XE_BO_FLAG_GGTT_INVALIDATE BIT(17) +#define XE_BO_FLAG_GGTT0 BIT(18) +#define XE_BO_FLAG_GGTT1 BIT(19) +#define XE_BO_FLAG_GGTT2 BIT(20) +#define XE_BO_FLAG_GGTT3 BIT(21) +#define XE_BO_FLAG_GGTT_ALL (XE_BO_FLAG_GGTT0 | \ + XE_BO_FLAG_GGTT1 | \ + XE_BO_FLAG_GGTT2 | \ + XE_BO_FLAG_GGTT3) + /* this one is trigger internally only */ #define XE_BO_FLAG_INTERNAL_TEST BIT(30) #define XE_BO_FLAG_INTERNAL_64K BIT(31) +#define XE_BO_FLAG_GGTTx(tile) \ + (XE_BO_FLAG_GGTT0 << (tile)->id) + #define XE_PTE_SHIFT 12 #define XE_PAGE_SIZE (1 << XE_PTE_SHIFT) #define XE_PTE_MASK (XE_PAGE_SIZE - 1) @@ -194,18 +206,29 @@ xe_bo_main_addr(struct xe_bo *bo, size_t page_size) } static inline u32 -xe_bo_ggtt_addr(struct xe_bo *bo) +__xe_bo_ggtt_addr(struct xe_bo *bo, u8 tile_id) { - if (XE_WARN_ON(!bo->ggtt_node)) + struct xe_ggtt_node *ggtt_node = bo->ggtt_node[tile_id]; + + if (XE_WARN_ON(!ggtt_node)) return 0; - XE_WARN_ON(bo->ggtt_node->base.size > bo->size); - XE_WARN_ON(bo->ggtt_node->base.start + bo->ggtt_node->base.size > (1ull << 32)); - return bo->ggtt_node->base.start; + XE_WARN_ON(ggtt_node->base.size > bo->size); + XE_WARN_ON(ggtt_node->base.start + ggtt_node->base.size > (1ull << 32)); + return ggtt_node->base.start; +} + +static inline u32 +xe_bo_ggtt_addr(struct xe_bo *bo) +{ + xe_assert(xe_bo_device(bo), bo->tile); + + return __xe_bo_ggtt_addr(bo, bo->tile->id); } int xe_bo_vmap(struct xe_bo *bo); void xe_bo_vunmap(struct xe_bo *bo); +int xe_bo_read(struct xe_bo *bo, u64 offset, void *dst, int size); bool mem_type_is_vram(u32 mem_type); bool xe_bo_is_vram(struct xe_bo *bo); diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c index 8fb2be061003..6a40eedd9db1 100644 --- a/drivers/gpu/drm/xe/xe_bo_evict.c +++ b/drivers/gpu/drm/xe/xe_bo_evict.c @@ -152,11 +152,17 @@ int xe_bo_restore_kernel(struct xe_device *xe) } if (bo->flags & XE_BO_FLAG_GGTT) { - struct xe_tile *tile = bo->tile; + struct xe_tile *tile; + u8 id; - mutex_lock(&tile->mem.ggtt->lock); - xe_ggtt_map_bo(tile->mem.ggtt, bo); - mutex_unlock(&tile->mem.ggtt->lock); + for_each_tile(tile, xe, id) { + if (tile != bo->tile && !(bo->flags & XE_BO_FLAG_GGTTx(tile))) + continue; + + mutex_lock(&tile->mem.ggtt->lock); + xe_ggtt_map_bo(tile->mem.ggtt, bo); + mutex_unlock(&tile->mem.ggtt->lock); + } } /* diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h index 13c6d8a69e91..46dc9e4e3e46 100644 --- a/drivers/gpu/drm/xe/xe_bo_types.h +++ b/drivers/gpu/drm/xe/xe_bo_types.h @@ -10,9 +10,9 @@ #include <drm/ttm/ttm_bo.h> #include <drm/ttm/ttm_device.h> -#include <drm/ttm/ttm_execbuf_util.h> #include <drm/ttm/ttm_placement.h> +#include "xe_device_types.h" #include "xe_ggtt_types.h" struct xe_device; @@ -39,8 +39,8 @@ struct xe_bo { struct ttm_place placements[XE_BO_MAX_PLACEMENTS]; /** @placement: current placement for this BO */ struct ttm_placement placement; - /** @ggtt_node: GGTT node if this BO is mapped in the GGTT */ - struct xe_ggtt_node *ggtt_node; + /** @ggtt_node: Array of GGTT nodes if this BO is mapped in the GGTTs */ + struct xe_ggtt_node *ggtt_node[XE_MAX_TILES_PER_DEVICE]; /** @vmap: iosys map of this buffer */ struct iosys_map vmap; /** @ttm_kmap: TTM bo kmap object for internal use only. Keep off. */ diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index f8947e7e917e..71636e80b71d 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -30,30 +30,39 @@ /** * DOC: Xe device coredump * - * Devices overview: * Xe uses dev_coredump infrastructure for exposing the crash errors in a - * standardized way. - * devcoredump exposes a temporary device under /sys/class/devcoredump/ - * which is linked with our card device directly. - * The core dump can be accessed either from - * /sys/class/drm/card<n>/device/devcoredump/ or from - * /sys/class/devcoredump/devcd<m> where - * /sys/class/devcoredump/devcd<m>/failing_device is a link to - * /sys/class/drm/card<n>/device/. + * standardized way. Once a crash occurs, devcoredump exposes a temporary + * node under ``/sys/class/devcoredump/devcd<m>/``. The same node is also + * accessible in ``/sys/class/drm/card<n>/device/devcoredump/``. The + * ``failing_device`` symlink points to the device that crashed and created the + * coredump. * - * Snapshot at hang: - * The 'data' file is printed with a drm_printer pointer at devcoredump read - * time. For this reason, we need to take snapshots from when the hang has - * happened, and not only when the user is reading the file. Otherwise the - * information is outdated since the resets might have happened in between. + * The following characteristics are observed by xe when creating a device + * coredump: * - * 'First' failure snapshot: - * In general, the first hang is the most critical one since the following hangs - * can be a consequence of the initial hang. For this reason we only take the - * snapshot of the 'first' failure and ignore subsequent calls of this function, - * at least while the coredump device is alive. Dev_coredump has a delayed work - * queue that will eventually delete the device and free all the dump - * information. + * **Snapshot at hang**: + * The 'data' file contains a snapshot of the HW and driver states at the time + * the hang happened. Due to the driver recovering from resets/crashes, it may + * not correspond to the state of the system when the file is read by + * userspace. + * + * **Coredump release**: + * After a coredump is generated, it stays in kernel memory until released by + * userpace by writing anything to it, or after an internal timer expires. The + * exact timeout may vary and should not be relied upon. Example to release + * a coredump: + * + * .. code-block:: shell + * + * $ > /sys/class/drm/card0/device/devcoredump/data + * + * **First failure only**: + * In general, the first hang is the most critical one since the following + * hangs can be a consequence of the initial hang. For this reason a snapshot + * is taken only for the first failure. Until the devcoredump is released by + * userspace or kernel, all subsequent hangs do not override the snapshot nor + * create new ones. Devcoredump has a delayed work queue that will eventually + * delete the file node and free all the dump information. */ #ifdef CONFIG_DEV_COREDUMP @@ -91,6 +100,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, p = drm_coredump_printer(&iter); drm_puts(&p, "**** Xe Device Coredump ****\n"); + drm_printf(&p, "Reason: %s\n", ss->reason); drm_puts(&p, "kernel: " UTS_RELEASE "\n"); drm_puts(&p, "module: " KBUILD_MODNAME "\n"); @@ -98,7 +108,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, drm_printf(&p, "Snapshot time: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec); ts = ktime_to_timespec64(ss->boot_time); drm_printf(&p, "Uptime: %lld.%09ld\n", ts.tv_sec, ts.tv_nsec); - drm_printf(&p, "Process: %s\n", ss->process_name); + drm_printf(&p, "Process: %s [%d]\n", ss->process_name, ss->pid); xe_device_snapshot_print(xe, &p); drm_printf(&p, "\n**** GT #%d ****\n", ss->gt->info.id); @@ -130,6 +140,9 @@ static void xe_devcoredump_snapshot_free(struct xe_devcoredump_snapshot *ss) { int i; + kfree(ss->reason); + ss->reason = NULL; + xe_guc_log_snapshot_free(ss->guc.log); ss->guc.log = NULL; @@ -170,16 +183,24 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset, /* Ensure delayed work is captured before continuing */ flush_work(&ss->work); - if (!ss->read.buffer) + mutex_lock(&coredump->lock); + + if (!ss->read.buffer) { + mutex_unlock(&coredump->lock); return -ENODEV; + } - if (offset >= ss->read.size) + if (offset >= ss->read.size) { + mutex_unlock(&coredump->lock); return 0; + } byte_copied = count < ss->read.size - offset ? count : ss->read.size - offset; memcpy(buffer, ss->read.buffer + offset, byte_copied); + mutex_unlock(&coredump->lock); + return byte_copied; } @@ -193,15 +214,18 @@ static void xe_devcoredump_free(void *data) cancel_work_sync(&coredump->snapshot.work); + mutex_lock(&coredump->lock); + xe_devcoredump_snapshot_free(&coredump->snapshot); kvfree(coredump->snapshot.read.buffer); /* To prevent stale data on next snapshot, clear everything */ memset(&coredump->snapshot, 0, sizeof(coredump->snapshot)); coredump->captured = false; - coredump->job = NULL; drm_info(&coredump_to_xe(coredump)->drm, "Xe device coredump has been deleted.\n"); + + mutex_unlock(&coredump->lock); } static void xe_devcoredump_deferred_snap_work(struct work_struct *work) @@ -244,10 +268,10 @@ static void xe_devcoredump_deferred_snap_work(struct work_struct *work) } static void devcoredump_snapshot(struct xe_devcoredump *coredump, + struct xe_exec_queue *q, struct xe_sched_job *job) { struct xe_devcoredump_snapshot *ss = &coredump->snapshot; - struct xe_exec_queue *q = job->q; struct xe_guc *guc = exec_queue_to_guc(q); u32 adj_logical_mask = q->logical_mask; u32 width_mask = (0x1 << q->width) - 1; @@ -260,12 +284,14 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump, ss->snapshot_time = ktime_get_real(); ss->boot_time = ktime_get_boottime(); - if (q->vm && q->vm->xef) + if (q->vm && q->vm->xef) { process_name = q->vm->xef->process_name; + ss->pid = q->vm->xef->pid; + } + strscpy(ss->process_name, process_name); ss->gt = q->gt; - coredump->job = job; INIT_WORK(&ss->work, xe_devcoredump_deferred_snap_work); cookie = dma_fence_begin_signalling(); @@ -284,10 +310,11 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump, ss->guc.log = xe_guc_log_snapshot_capture(&guc->log, true); ss->guc.ct = xe_guc_ct_snapshot_capture(&guc->ct); ss->ge = xe_guc_exec_queue_snapshot_capture(q); - ss->job = xe_sched_job_snapshot_capture(job); + if (job) + ss->job = xe_sched_job_snapshot_capture(job); ss->vm = xe_vm_snapshot_capture(q->vm); - xe_engine_snapshot_capture_for_job(job); + xe_engine_snapshot_capture_for_queue(q); queue_work(system_unbound_wq, &ss->work); @@ -297,28 +324,42 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump, /** * xe_devcoredump - Take the required snapshots and initialize coredump device. + * @q: The faulty xe_exec_queue, where the issue was detected. * @job: The faulty xe_sched_job, where the issue was detected. + * @fmt: Printf format + args to describe the reason for the core dump * * This function should be called at the crash time within the serialized * gt_reset. It is skipped if we still have the core dump device available * with the information of the 'first' snapshot. */ -void xe_devcoredump(struct xe_sched_job *job) +__printf(3, 4) +void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const char *fmt, ...) { - struct xe_device *xe = gt_to_xe(job->q->gt); + struct xe_device *xe = gt_to_xe(q->gt); struct xe_devcoredump *coredump = &xe->devcoredump; + va_list varg; + + mutex_lock(&coredump->lock); if (coredump->captured) { drm_dbg(&xe->drm, "Multiple hangs are occurring, but only the first snapshot was taken\n"); + mutex_unlock(&coredump->lock); return; } coredump->captured = true; - devcoredump_snapshot(coredump, job); + + va_start(varg, fmt); + coredump->snapshot.reason = kvasprintf(GFP_ATOMIC, fmt, varg); + va_end(varg); + + devcoredump_snapshot(coredump, q, job); drm_info(&xe->drm, "Xe device coredump has been created\n"); drm_info(&xe->drm, "Check your /sys/class/drm/card%d/device/devcoredump/data\n", xe->drm.primary->index); + + mutex_unlock(&coredump->lock); } static void xe_driver_devcoredump_fini(void *arg) @@ -330,6 +371,18 @@ static void xe_driver_devcoredump_fini(void *arg) int xe_devcoredump_init(struct xe_device *xe) { + int err; + + err = drmm_mutex_init(&xe->drm, &xe->devcoredump.lock); + if (err) + return err; + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + fs_reclaim_acquire(GFP_KERNEL); + might_lock(&xe->devcoredump.lock); + fs_reclaim_release(GFP_KERNEL); + } + return devm_add_action_or_reset(xe->drm.dev, xe_driver_devcoredump_fini, &xe->drm); } diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h index a4eebc285fc8..6a17e6d60102 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.h +++ b/drivers/gpu/drm/xe/xe_devcoredump.h @@ -10,13 +10,16 @@ struct drm_printer; struct xe_device; +struct xe_exec_queue; struct xe_sched_job; #ifdef CONFIG_DEV_COREDUMP -void xe_devcoredump(struct xe_sched_job *job); +void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const char *fmt, ...); int xe_devcoredump_init(struct xe_device *xe); #else -static inline void xe_devcoredump(struct xe_sched_job *job) +static inline void xe_devcoredump(struct xe_exec_queue *q, + struct xe_sched_job *job, + const char *fmt, ...) { } diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h index 3703ddea1252..1a1d16a96b2d 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h @@ -28,6 +28,10 @@ struct xe_devcoredump_snapshot { ktime_t boot_time; /** @process_name: Name of process that triggered this gpu hang */ char process_name[TASK_COMM_LEN]; + /** @pid: Process id of process that triggered this gpu hang */ + pid_t pid; + /** @reason: The reason the coredump was triggered */ + char *reason; /** @gt: Affected GT, used by forcewake for delayed capture */ struct xe_gt *gt; @@ -76,12 +80,12 @@ struct xe_devcoredump_snapshot { * for reading the information. */ struct xe_devcoredump { - /** @captured: The snapshot of the first hang has already been taken. */ + /** @lock: protects access to entire structure */ + struct mutex lock; + /** @captured: The snapshot of the first hang has already been taken */ bool captured; /** @snapshot: Snapshot is captured at time of the first crash */ struct xe_devcoredump_snapshot snapshot; - /** @job: Point to the faulting job */ - struct xe_sched_job *job; }; #endif diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 06d6db8b50f9..56d4ffb650da 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -44,6 +44,7 @@ #include "xe_memirq.h" #include "xe_mmio.h" #include "xe_module.h" +#include "xe_oa.h" #include "xe_observation.h" #include "xe_pat.h" #include "xe_pcode.h" @@ -55,6 +56,7 @@ #include "xe_ttm_sys_mgr.h" #include "xe_vm.h" #include "xe_vram.h" +#include "xe_vsec.h" #include "xe_wait_user_fence.h" #include "xe_wa.h" @@ -269,7 +271,6 @@ static struct drm_driver driver = { .fops = &xe_driver_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, @@ -366,6 +367,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev, goto err; } + err = drmm_mutex_init(&xe->drm, &xe->pmt.lock); + if (err) + goto err; + err = xe_display_create(xe); if (WARN_ON(err)) goto err; @@ -760,6 +765,8 @@ int xe_device_probe(struct xe_device *xe) for_each_gt(gt, xe, id) xe_gt_sanitize_freq(gt); + xe_vsec_init(xe); + return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe); err_fini_display: diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h index b9ea455d6f59..ace22e35e769 100644 --- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -16,7 +16,7 @@ #include "xe_heci_gsc.h" #include "xe_lmtt_types.h" #include "xe_memirq_types.h" -#include "xe_oa.h" +#include "xe_oa_types.h" #include "xe_platform_types.h" #include "xe_pt_types.h" #include "xe_sriov_types.h" @@ -42,8 +42,6 @@ struct xe_pat_ops; #define GRAPHICS_VERx100(xe) ((xe)->info.graphics_verx100) #define MEDIA_VERx100(xe) ((xe)->info.media_verx100) #define IS_DGFX(xe) ((xe)->info.is_dgfx) -#define HAS_HECI_GSCFI(xe) ((xe)->info.has_heci_gscfi) -#define HAS_HECI_CSCFI(xe) ((xe)->info.has_heci_cscfi) #define XE_VRAM_FLAGS_NEED64K BIT(0) @@ -296,14 +294,24 @@ struct xe_device { /** @info.va_bits: Maximum bits of a virtual address */ u8 va_bits; - /** @info.is_dgfx: is discrete device */ - u8 is_dgfx:1; - /** @info.has_asid: Has address space ID */ - u8 has_asid:1; + /* + * Keep all flags below alphabetically sorted + */ + /** @info.force_execlist: Forced execlist submission */ u8 force_execlist:1; + /** @info.has_asid: Has address space ID */ + u8 has_asid:1; + /** @info.has_atomic_enable_pte_bit: Device has atomic enable PTE bit */ + u8 has_atomic_enable_pte_bit:1; + /** @info.has_device_atomics_on_smem: Supports device atomics on SMEM */ + u8 has_device_atomics_on_smem:1; /** @info.has_flat_ccs: Whether flat CCS metadata is used */ u8 has_flat_ccs:1; + /** @info.has_heci_cscfi: device has heci cscfi */ + u8 has_heci_cscfi:1; + /** @info.has_heci_gscfi: device has heci gscfi */ + u8 has_heci_gscfi:1; /** @info.has_llc: Device has a shared CPU+GPU last level cache */ u8 has_llc:1; /** @info.has_mmio_ext: Device has extra MMIO address range */ @@ -314,6 +322,8 @@ struct xe_device { u8 has_sriov:1; /** @info.has_usm: Device has unified shared memory support */ u8 has_usm:1; + /** @info.is_dgfx: is discrete device */ + u8 is_dgfx:1; /** * @info.probe_display: Probe display hardware. If set to * false, the driver will behave as if there is no display @@ -323,20 +333,12 @@ struct xe_device { * state the firmware or bootloader left it in. */ u8 probe_display:1; + /** @info.skip_guc_pc: Skip GuC based PM feature init */ + u8 skip_guc_pc:1; /** @info.skip_mtcfg: skip Multi-Tile configuration from MTCFG register */ u8 skip_mtcfg:1; /** @info.skip_pcode: skip access to PCODE uC */ u8 skip_pcode:1; - /** @info.has_heci_gscfi: device has heci gscfi */ - u8 has_heci_gscfi:1; - /** @info.has_heci_cscfi: device has heci cscfi */ - u8 has_heci_cscfi:1; - /** @info.skip_guc_pc: Skip GuC based PM feature init */ - u8 skip_guc_pc:1; - /** @info.has_atomic_enable_pte_bit: Device has atomic enable PTE bit */ - u8 has_atomic_enable_pte_bit:1; - /** @info.has_device_atomics_on_smem: Supports device atomics on SMEM */ - u8 has_device_atomics_on_smem:1; } info; /** @irq: device interrupt state */ @@ -345,7 +347,7 @@ struct xe_device { spinlock_t lock; /** @irq.enabled: interrupts enabled on this device */ - bool enabled; + atomic_t enabled; } irq; /** @ttm: ttm device */ @@ -374,6 +376,8 @@ struct xe_device { /** @sriov.pf: PF specific data */ struct xe_device_pf pf; + /** @sriov.vf: VF specific data */ + struct xe_device_vf vf; /** @sriov.wq: workqueue used by the virtualization workers */ struct workqueue_struct *wq; @@ -481,6 +485,12 @@ struct xe_device { struct mutex lock; } d3cold; + /** @pmt: Support the PMT driver callback interface */ + struct { + /** @pmt.lock: protect access for telemetry data */ + struct mutex lock; + } pmt; + /** * @pm_callback_task: Track the active task that is running in either * the runtime_suspend or runtime_resume callbacks. @@ -588,7 +598,7 @@ struct xe_file { /** @vm.xe: xarray to store VMs */ struct xarray xa; /** - * @vm.lock: Protects VM lookup + reference and removal a from + * @vm.lock: Protects VM lookup + reference and removal from * file xarray. Not an intended to be an outer lock which does * thing while being held. */ @@ -601,10 +611,15 @@ struct xe_file { struct xarray xa; /** * @exec_queue.lock: Protects exec queue lookup + reference and - * removal a frommfile xarray. Not an intended to be an outer - * lock which does thing while being held. + * removal from file xarray. Not intended to be an outer lock + * which does things while being held. */ struct mutex lock; + /** + * @exec_queue.pending_removal: items pending to be removed to + * synchronize GPU state update with ongoing query. + */ + atomic_t pending_removal; } exec_queue; /** @run_ticks: hw engine class run time in ticks for this drm client */ diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c index 22f0f1a6dfd5..298a587da7f1 100644 --- a/drivers/gpu/drm/xe/xe_drm_client.c +++ b/drivers/gpu/drm/xe/xe_drm_client.c @@ -269,6 +269,49 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file) } } +static struct xe_hw_engine *any_engine(struct xe_device *xe) +{ + struct xe_gt *gt; + unsigned long gt_id; + + for_each_gt(gt, xe, gt_id) { + struct xe_hw_engine *hwe = xe_gt_any_hw_engine(gt); + + if (hwe) + return hwe; + } + + return NULL; +} + +static bool force_wake_get_any_engine(struct xe_device *xe, + struct xe_hw_engine **phwe, + unsigned int *pfw_ref) +{ + enum xe_force_wake_domains domain; + unsigned int fw_ref; + struct xe_hw_engine *hwe; + struct xe_force_wake *fw; + + hwe = any_engine(xe); + if (!hwe) + return false; + + domain = xe_hw_engine_to_fw_domain(hwe); + fw = gt_to_fw(hwe->gt); + + fw_ref = xe_force_wake_get(fw, domain); + if (!xe_force_wake_ref_has_domain(fw_ref, domain)) { + xe_force_wake_put(fw, fw_ref); + return false; + } + + *phwe = hwe; + *pfw_ref = fw_ref; + + return true; +} + static void show_run_ticks(struct drm_printer *p, struct drm_file *file) { unsigned long class, i, gt_id, capacity[XE_ENGINE_CLASS_MAX] = { }; @@ -280,7 +323,18 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file) u64 gpu_timestamp; unsigned int fw_ref; + /* + * Wait for any exec queue going away: their cycles will get updated on + * context switch out, so wait for that to happen + */ + wait_var_event(&xef->exec_queue.pending_removal, + !atomic_read(&xef->exec_queue.pending_removal)); + xe_pm_runtime_get(xe); + if (!force_wake_get_any_engine(xe, &hwe, &fw_ref)) { + xe_pm_runtime_put(xe); + return; + } /* Accumulate all the exec queues from this client */ mutex_lock(&xef->exec_queue.lock); @@ -295,33 +349,11 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file) } mutex_unlock(&xef->exec_queue.lock); - /* Get the total GPU cycles */ - for_each_gt(gt, xe, gt_id) { - enum xe_force_wake_domains fw; - - hwe = xe_gt_any_hw_engine(gt); - if (!hwe) - continue; - - fw = xe_hw_engine_to_fw_domain(hwe); - - fw_ref = xe_force_wake_get(gt_to_fw(gt), fw); - if (!xe_force_wake_ref_has_domain(fw_ref, fw)) { - hwe = NULL; - xe_force_wake_put(gt_to_fw(gt), fw_ref); - break; - } - - gpu_timestamp = xe_hw_engine_read_timestamp(hwe); - xe_force_wake_put(gt_to_fw(gt), fw_ref); - break; - } + gpu_timestamp = xe_hw_engine_read_timestamp(hwe); + xe_force_wake_put(gt_to_fw(hwe->gt), fw_ref); xe_pm_runtime_put(xe); - if (unlikely(!hwe)) - return; - for (class = 0; class < XE_ENGINE_CLASS_MAX; class++) { const char *class_name; diff --git a/drivers/gpu/drm/xe/xe_drv.h b/drivers/gpu/drm/xe/xe_drv.h index d45b71426cc8..d61650d4aa0b 100644 --- a/drivers/gpu/drm/xe/xe_drv.h +++ b/drivers/gpu/drm/xe/xe_drv.h @@ -10,7 +10,6 @@ #define DRIVER_NAME "xe" #define DRIVER_DESC "Intel Xe Graphics" -#define DRIVER_DATE "20201103" /* Interface history: * diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index fd0f3b3c9101..aab9e561153d 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -240,6 +240,7 @@ struct xe_exec_queue *xe_exec_queue_create_bind(struct xe_device *xe, return q; } +ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO); void xe_exec_queue_destroy(struct kref *ref) { @@ -262,8 +263,11 @@ void xe_exec_queue_fini(struct xe_exec_queue *q) /* * Before releasing our ref to lrc and xef, accumulate our run ticks + * and wakeup any waiters. */ xe_exec_queue_update_run_ticks(q); + if (q->xef && atomic_dec_and_test(&q->xef->exec_queue.pending_removal)) + wake_up_var(&q->xef->exec_queue.pending_removal); for (i = 0; i < q->width; ++i) xe_lrc_put(q->lrc[i]); @@ -826,7 +830,10 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data, mutex_lock(&xef->exec_queue.lock); q = xa_erase(&xef->exec_queue.xa, args->exec_queue_id); + if (q) + atomic_inc(&xef->exec_queue.pending_removal); mutex_unlock(&xef->exec_queue.lock); + if (XE_IOCTL_DBG(xe, !q)) return -ENOENT; diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c index 558fac8bb6fb..05154f9de1a6 100644 --- a/drivers/gpu/drm/xe/xe_ggtt.c +++ b/drivers/gpu/drm/xe/xe_ggtt.c @@ -598,10 +598,10 @@ void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) u64 start; u64 offset, pte; - if (XE_WARN_ON(!bo->ggtt_node)) + if (XE_WARN_ON(!bo->ggtt_node[ggtt->tile->id])) return; - start = bo->ggtt_node->base.start; + start = bo->ggtt_node[ggtt->tile->id]->base.start; for (offset = 0; offset < bo->size; offset += XE_PAGE_SIZE) { pte = ggtt->pt_ops->pte_encode_bo(bo, offset, pat_index); @@ -612,15 +612,16 @@ void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo, u64 start, u64 end) { - int err; u64 alignment = bo->min_align > 0 ? bo->min_align : XE_PAGE_SIZE; + u8 tile_id = ggtt->tile->id; + int err; if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K) alignment = SZ_64K; - if (XE_WARN_ON(bo->ggtt_node)) { + if (XE_WARN_ON(bo->ggtt_node[tile_id])) { /* Someone's already inserted this BO in the GGTT */ - xe_tile_assert(ggtt->tile, bo->ggtt_node->base.size == bo->size); + xe_tile_assert(ggtt->tile, bo->ggtt_node[tile_id]->base.size == bo->size); return 0; } @@ -630,19 +631,19 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo, xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile)); - bo->ggtt_node = xe_ggtt_node_init(ggtt); - if (IS_ERR(bo->ggtt_node)) { - err = PTR_ERR(bo->ggtt_node); - bo->ggtt_node = NULL; + bo->ggtt_node[tile_id] = xe_ggtt_node_init(ggtt); + if (IS_ERR(bo->ggtt_node[tile_id])) { + err = PTR_ERR(bo->ggtt_node[tile_id]); + bo->ggtt_node[tile_id] = NULL; goto out; } mutex_lock(&ggtt->lock); - err = drm_mm_insert_node_in_range(&ggtt->mm, &bo->ggtt_node->base, bo->size, - alignment, 0, start, end, 0); + err = drm_mm_insert_node_in_range(&ggtt->mm, &bo->ggtt_node[tile_id]->base, + bo->size, alignment, 0, start, end, 0); if (err) { - xe_ggtt_node_fini(bo->ggtt_node); - bo->ggtt_node = NULL; + xe_ggtt_node_fini(bo->ggtt_node[tile_id]); + bo->ggtt_node[tile_id] = NULL; } else { xe_ggtt_map_bo(ggtt, bo); } @@ -691,13 +692,15 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) */ void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) { - if (XE_WARN_ON(!bo->ggtt_node)) + u8 tile_id = ggtt->tile->id; + + if (XE_WARN_ON(!bo->ggtt_node[tile_id])) return; /* This BO is not currently in the GGTT */ - xe_tile_assert(ggtt->tile, bo->ggtt_node->base.size == bo->size); + xe_tile_assert(ggtt->tile, bo->ggtt_node[tile_id]->base.size == bo->size); - xe_ggtt_node_remove(bo->ggtt_node, + xe_ggtt_node_remove(bo->ggtt_node[tile_id], bo->flags & XE_BO_FLAG_GGTT_INVALIDATE); } diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h index 64b2ae6839db..c250ea773491 100644 --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h @@ -71,8 +71,14 @@ static inline void xe_sched_add_pending_job(struct xe_gpu_scheduler *sched, static inline struct xe_sched_job *xe_sched_first_pending_job(struct xe_gpu_scheduler *sched) { - return list_first_entry_or_null(&sched->base.pending_list, - struct xe_sched_job, drm.list); + struct xe_sched_job *job; + + spin_lock(&sched->base.job_list_lock); + job = list_first_entry_or_null(&sched->base.pending_list, + struct xe_sched_job, drm.list); + spin_unlock(&sched->base.job_list_lock); + + return job; } static inline int diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c index fc64b45d324b..24cc6a4f9a96 100644 --- a/drivers/gpu/drm/xe/xe_gsc_proxy.c +++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c @@ -139,17 +139,29 @@ static int proxy_send_to_gsc(struct xe_gsc *gsc, u32 size) return 0; } -static int validate_proxy_header(struct xe_gsc_proxy_header *header, +static int validate_proxy_header(struct xe_gt *gt, + struct xe_gsc_proxy_header *header, u32 source, u32 dest, u32 max_size) { u32 type = FIELD_GET(GSC_PROXY_TYPE, header->hdr); u32 length = FIELD_GET(GSC_PROXY_PAYLOAD_LENGTH, header->hdr); + int ret = 0; - if (header->destination != dest || header->source != source) - return -ENOEXEC; + if (header->destination != dest || header->source != source) { + ret = -ENOEXEC; + goto out; + } - if (length + PROXY_HDR_SIZE > max_size) - return -E2BIG; + if (length + PROXY_HDR_SIZE > max_size) { + ret = -E2BIG; + goto out; + } + + /* We only care about the status if this is a message for the driver */ + if (dest == GSC_PROXY_ADDRESSING_KMD && header->status != 0) { + ret = -EIO; + goto out; + } switch (type) { case GSC_PROXY_MSG_TYPE_PROXY_PAYLOAD: @@ -157,12 +169,20 @@ static int validate_proxy_header(struct xe_gsc_proxy_header *header, break; fallthrough; case GSC_PROXY_MSG_TYPE_PROXY_INVALID: - return -EIO; + ret = -EIO; + break; default: break; } - return 0; +out: + if (ret) + xe_gt_err(gt, + "GSC proxy error: s=0x%x[0x%x], d=0x%x[0x%x], t=%u, l=0x%x, st=0x%x\n", + header->source, source, header->destination, dest, + type, length, header->status); + + return ret; } #define proxy_header_wr(xe_, map_, offset_, field_, val_) \ @@ -228,12 +248,17 @@ static int proxy_query(struct xe_gsc *gsc) xe_map_memcpy_from(xe, to_csme_hdr, &gsc->proxy.from_gsc, reply_offset, PROXY_HDR_SIZE); - /* stop if this was the last message */ - if (FIELD_GET(GSC_PROXY_TYPE, to_csme_hdr->hdr) == GSC_PROXY_MSG_TYPE_PROXY_END) + /* Check the status and stop if this was the last message */ + if (FIELD_GET(GSC_PROXY_TYPE, to_csme_hdr->hdr) == GSC_PROXY_MSG_TYPE_PROXY_END) { + ret = validate_proxy_header(gt, to_csme_hdr, + GSC_PROXY_ADDRESSING_GSC, + GSC_PROXY_ADDRESSING_KMD, + GSC_PROXY_BUFFER_SIZE - reply_offset); break; + } /* make sure the GSC-to-CSME proxy header is sane */ - ret = validate_proxy_header(to_csme_hdr, + ret = validate_proxy_header(gt, to_csme_hdr, GSC_PROXY_ADDRESSING_GSC, GSC_PROXY_ADDRESSING_CSME, GSC_PROXY_BUFFER_SIZE - reply_offset); @@ -262,7 +287,7 @@ static int proxy_query(struct xe_gsc *gsc) } /* make sure the CSME-to-GSC proxy header is sane */ - ret = validate_proxy_header(gsc->proxy.from_csme, + ret = validate_proxy_header(gt, gsc->proxy.from_csme, GSC_PROXY_ADDRESSING_CSME, GSC_PROXY_ADDRESSING_GSC, GSC_PROXY_BUFFER_SIZE - reply_offset); diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index d6744be01a68..41ab7fbebc19 100644 --- a/drivers/gpu/drm/xe/xe_gt.c +++ b/drivers/gpu/drm/xe/xe_gt.c @@ -748,10 +748,8 @@ static int do_gt_restart(struct xe_gt *gt) if (err) return err; - for_each_hw_engine(hwe, gt, id) { + for_each_hw_engine(hwe, gt, id) xe_reg_sr_apply_mmio(&hwe->reg_sr, gt); - xe_reg_sr_apply_whitelist(hwe); - } /* Get CCS mode in sync between sw/hw */ xe_gt_apply_ccs_mode(gt); diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index 79c426dc2505..2606cd396df5 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -10,7 +10,6 @@ #include <drm/drm_exec.h> #include <drm/drm_managed.h> -#include <drm/ttm/ttm_execbuf_util.h> #include "abi/guc_actions_abi.h" #include "xe_bo.h" diff --git a/drivers/gpu/drm/xe/xe_gt_printk.h b/drivers/gpu/drm/xe/xe_gt_printk.h index 5dc71394372d..11da0228cea7 100644 --- a/drivers/gpu/drm/xe/xe_gt_printk.h +++ b/drivers/gpu/drm/xe/xe_gt_printk.h @@ -60,6 +60,21 @@ static inline void __xe_gt_printfn_info(struct drm_printer *p, struct va_format xe_gt_info(gt, "%pV", vaf); } +static inline void __xe_gt_printfn_dbg(struct drm_printer *p, struct va_format *vaf) +{ + struct xe_gt *gt = p->arg; + struct drm_printer dbg; + + /* + * The original xe_gt_dbg() callsite annotations are useless here, + * redirect to the tweaked drm_dbg_printer() instead. + */ + dbg = drm_dbg_printer(>_to_xe(gt)->drm, DRM_UT_DRIVER, NULL); + dbg.origin = p->origin; + + drm_printf(&dbg, "GT%u: %pV", gt->info.id, vaf); +} + /** * xe_gt_err_printer - Construct a &drm_printer that outputs to xe_gt_err() * @gt: the &xe_gt pointer to use in xe_gt_err() @@ -90,4 +105,20 @@ static inline struct drm_printer xe_gt_info_printer(struct xe_gt *gt) return p; } +/** + * xe_gt_dbg_printer - Construct a &drm_printer that outputs like xe_gt_dbg() + * @gt: the &xe_gt pointer to use in xe_gt_dbg() + * + * Return: The &drm_printer object. + */ +static inline struct drm_printer xe_gt_dbg_printer(struct xe_gt *gt) +{ + struct drm_printer p = { + .printfn = __xe_gt_printfn_dbg, + .arg = gt, + .origin = (const void *)_THIS_IP_, + }; + return p; +} + #endif diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c index 192643d63d22..65082f12f1a8 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c @@ -207,6 +207,11 @@ static int pf_push_vf_cfg_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u return pf_push_vf_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_PREEMPT_TIMEOUT_KEY, *preempt_timeout); } +static int pf_push_vf_cfg_sched_priority(struct xe_gt *gt, unsigned int vfid, u32 priority) +{ + return pf_push_vf_cfg_u32(gt, vfid, GUC_KLV_VF_CFG_SCHED_PRIORITY_KEY, priority); +} + static int pf_push_vf_cfg_lmem(struct xe_gt *gt, unsigned int vfid, u64 size) { return pf_push_vf_cfg_u64(gt, vfid, GUC_KLV_VF_CFG_LMEM_SIZE_KEY, size); @@ -1540,8 +1545,6 @@ static u64 pf_query_max_lmem(struct xe_gt *gt) #ifdef CONFIG_DRM_XE_DEBUG_SRIOV #define MAX_FAIR_LMEM SZ_128M /* XXX: make it small for the driver bringup */ -#else -#define MAX_FAIR_LMEM SZ_2G /* XXX: known issue with allocating BO over 2GiB */ #endif static u64 pf_estimate_fair_lmem(struct xe_gt *gt, unsigned int num_vfs) @@ -1767,6 +1770,77 @@ u32 xe_gt_sriov_pf_config_get_preempt_timeout(struct xe_gt *gt, unsigned int vfi return preempt_timeout; } +static const char *sched_priority_unit(u32 priority) +{ + return priority == GUC_SCHED_PRIORITY_LOW ? "(low)" : + priority == GUC_SCHED_PRIORITY_NORMAL ? "(normal)" : + priority == GUC_SCHED_PRIORITY_HIGH ? "(high)" : + "(?)"; +} + +static int pf_provision_sched_priority(struct xe_gt *gt, unsigned int vfid, u32 priority) +{ + struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); + int err; + + err = pf_push_vf_cfg_sched_priority(gt, vfid, priority); + if (unlikely(err)) + return err; + + config->sched_priority = priority; + return 0; +} + +static int pf_get_sched_priority(struct xe_gt *gt, unsigned int vfid) +{ + struct xe_gt_sriov_config *config = pf_pick_vf_config(gt, vfid); + + return config->sched_priority; +} + +/** + * xe_gt_sriov_pf_config_set_sched_priority() - Configure scheduling priority. + * @gt: the &xe_gt + * @vfid: the VF identifier + * @priority: requested scheduling priority + * + * This function can only be called on PF. + * + * Return: 0 on success or a negative error code on failure. + */ +int xe_gt_sriov_pf_config_set_sched_priority(struct xe_gt *gt, unsigned int vfid, u32 priority) +{ + int err; + + mutex_lock(xe_gt_sriov_pf_master_mutex(gt)); + err = pf_provision_sched_priority(gt, vfid, priority); + mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); + + return pf_config_set_u32_done(gt, vfid, priority, + xe_gt_sriov_pf_config_get_sched_priority(gt, vfid), + "scheduling priority", sched_priority_unit, err); +} + +/** + * xe_gt_sriov_pf_config_get_sched_priority - Get VF's scheduling priority. + * @gt: the &xe_gt + * @vfid: the VF identifier + * + * This function can only be called on PF. + * + * Return: VF's (or PF's) scheduling priority. + */ +u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid) +{ + u32 priority; + + mutex_lock(xe_gt_sriov_pf_master_mutex(gt)); + priority = pf_get_sched_priority(gt, vfid); + mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); + + return priority; +} + static void pf_reset_config_sched(struct xe_gt *gt, struct xe_gt_sriov_config *config) { lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt)); diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h index 0c55aa40a1a7..f894e9d4abba 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h @@ -44,6 +44,9 @@ u32 xe_gt_sriov_pf_config_get_preempt_timeout(struct xe_gt *gt, unsigned int vfi int xe_gt_sriov_pf_config_set_preempt_timeout(struct xe_gt *gt, unsigned int vfid, u32 preempt_timeout); +u32 xe_gt_sriov_pf_config_get_sched_priority(struct xe_gt *gt, unsigned int vfid); +int xe_gt_sriov_pf_config_set_sched_priority(struct xe_gt *gt, unsigned int vfid, u32 priority); + u32 xe_gt_sriov_pf_config_get_threshold(struct xe_gt *gt, unsigned int vfid, enum xe_guc_klv_threshold_index index); int xe_gt_sriov_pf_config_set_threshold(struct xe_gt *gt, unsigned int vfid, diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h index 2d3b73d78f14..686c7b3b6d7a 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_config_types.h @@ -33,6 +33,8 @@ struct xe_gt_sriov_config { u32 exec_quantum; /** @preempt_timeout: preemption timeout in microseconds. */ u32 preempt_timeout; + /** @sched_priority: scheduling priority. */ + u32 sched_priority; /** @thresholds: GuC thresholds for adverse events notifications. */ u32 thresholds[XE_GUC_KLV_NUM_THRESHOLDS]; }; diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c index 05df4ab3514b..b2521dd6ec42 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c @@ -164,6 +164,7 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent) * │ │ ├── contexts_spare * │ │ ├── exec_quantum_ms * │ │ ├── preempt_timeout_us + * │ │ ├── sched_priority * │ ├── vf1 * │ │ ├── ggtt_quota * │ │ ├── lmem_quota @@ -171,6 +172,7 @@ static void pf_add_policy_attrs(struct xe_gt *gt, struct dentry *parent) * │ │ ├── contexts_quota * │ │ ├── exec_quantum_ms * │ │ ├── preempt_timeout_us + * │ │ ├── sched_priority */ #define DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(CONFIG, TYPE, FORMAT) \ @@ -209,6 +211,7 @@ DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(ctxs, u32, "%llu\n"); DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(dbs, u32, "%llu\n"); DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(exec_quantum, u32, "%llu\n"); DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(preempt_timeout, u32, "%llu\n"); +DEFINE_SRIOV_GT_CONFIG_DEBUGFS_ATTRIBUTE(sched_priority, u32, "%llu\n"); /* * /sys/kernel/debug/dri/0/ @@ -295,6 +298,8 @@ static void pf_add_config_attrs(struct xe_gt *gt, struct dentry *parent, unsigne &exec_quantum_fops); debugfs_create_file_unsafe("preempt_timeout_us", 0644, parent, parent, &preempt_timeout_fops); + debugfs_create_file_unsafe("sched_priority", 0644, parent, parent, + &sched_priority_fops); /* register all threshold attributes */ #define register_threshold_attribute(TAG, NAME, ...) \ diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_helpers.h b/drivers/gpu/drm/xe/xe_gt_sriov_pf_helpers.h index 0bf12d89ceb2..6af219d93c3b 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_helpers.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_helpers.h @@ -18,7 +18,7 @@ * is within a range of supported VF numbers (up to maximum number of VFs that * driver can support, including VF0 that represents the PF itself). * - * Note: Effective only on debug builds. See `Xe ASSERTs`_ for more information. + * Note: Effective only on debug builds. See `Xe Asserts`_ for more information. */ #define xe_gt_sriov_pf_assert_vfid(gt, vfid) xe_sriov_pf_assert_vfid(gt_to_xe(gt), (vfid)) diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c index fae5be5a2a11..c00fb354705f 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_policy.c @@ -135,14 +135,33 @@ static int pf_update_policy_u32(struct xe_gt *gt, u16 key, u32 *policy, u32 valu return 0; } +static void pf_bulk_reset_sched_priority(struct xe_gt *gt, u32 priority) +{ + unsigned int total_vfs = 1 + xe_gt_sriov_pf_get_totalvfs(gt); + unsigned int n; + + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); + lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt)); + + for (n = 0; n < total_vfs; n++) + gt->sriov.pf.vfs[n].config.sched_priority = priority; +} + static int pf_provision_sched_if_idle(struct xe_gt *gt, bool enable) { + int err; + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt)); - return pf_update_policy_bool(gt, GUC_KLV_VGT_POLICY_SCHED_IF_IDLE_KEY, - >->sriov.pf.policy.guc.sched_if_idle, - enable); + err = pf_update_policy_bool(gt, GUC_KLV_VGT_POLICY_SCHED_IF_IDLE_KEY, + >->sriov.pf.policy.guc.sched_if_idle, + enable); + + if (!err) + pf_bulk_reset_sched_priority(gt, enable ? GUC_SCHED_PRIORITY_NORMAL : + GUC_SCHED_PRIORITY_LOW); + return err; } static int pf_reprovision_sched_if_idle(struct xe_gt *gt) diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c index d3baba50f085..cca5d5732802 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c @@ -27,6 +27,7 @@ #include "xe_guc_relay.h" #include "xe_mmio.h" #include "xe_sriov.h" +#include "xe_sriov_vf.h" #include "xe_uc_fw.h" #include "xe_wopcm.h" @@ -223,6 +224,44 @@ int xe_gt_sriov_vf_bootstrap(struct xe_gt *gt) return 0; } +static int guc_action_vf_notify_resfix_done(struct xe_guc *guc) +{ + u32 request[GUC_HXG_REQUEST_MSG_MIN_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE), + }; + int ret; + + ret = xe_guc_mmio_send(guc, request, ARRAY_SIZE(request)); + + return ret > 0 ? -EPROTO : ret; +} + +/** + * xe_gt_sriov_vf_notify_resfix_done - Notify GuC about resource fixups apply completed. + * @gt: the &xe_gt struct instance linked to target GuC + * + * Returns: 0 if the operation completed successfully, or a negative error + * code otherwise. + */ +int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt) +{ + struct xe_guc *guc = >->uc.guc; + int err; + + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); + + err = guc_action_vf_notify_resfix_done(guc); + if (unlikely(err)) + xe_gt_sriov_err(gt, "Failed to notify GuC about resource fixup done (%pe)\n", + ERR_PTR(err)); + else + xe_gt_sriov_dbg_verbose(gt, "sent GuC resource fixup done\n"); + + return err; +} + static int guc_action_query_single_klv(struct xe_guc *guc, u32 key, u32 *value, u32 value_len) { @@ -692,6 +731,30 @@ failed: return err; } +/** + * xe_gt_sriov_vf_migrated_event_handler - Start a VF migration recovery, + * or just mark that a GuC is ready for it. + * @gt: the &xe_gt struct instance linked to target GuC + * + * This function shall be called only by VF. + */ +void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt) +{ + struct xe_device *xe = gt_to_xe(gt); + + xe_gt_assert(gt, IS_SRIOV_VF(xe)); + + set_bit(gt->info.id, &xe->sriov.vf.migration.gt_flags); + /* + * We need to be certain that if all flags were set, at least one + * thread will notice that and schedule the recovery. + */ + smp_mb__after_atomic(); + + xe_gt_sriov_info(gt, "ready for recovery after migration\n"); + xe_sriov_vf_start_migration_recovery(xe); +} + static bool vf_is_negotiated(struct xe_gt *gt, u16 major, u16 minor) { xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h index e541ce57bec2..912d20814261 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h @@ -17,6 +17,8 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt); int xe_gt_sriov_vf_connect(struct xe_gt *gt); int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt); int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt); +int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt); +void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt); u32 xe_gt_sriov_vf_gmdid(struct xe_gt *gt); u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt); diff --git a/drivers/gpu/drm/xe/xe_gt_stats.h b/drivers/gpu/drm/xe/xe_gt_stats.h index 91d944f6c4e4..38325ef53617 100644 --- a/drivers/gpu/drm/xe/xe_gt_stats.h +++ b/drivers/gpu/drm/xe/xe_gt_stats.h @@ -6,15 +6,11 @@ #ifndef _XE_GT_STATS_H_ #define _XE_GT_STATS_H_ +#include "xe_gt_stats_types.h" + struct xe_gt; struct drm_printer; -enum xe_gt_stats_id { - XE_GT_STATS_ID_TLB_INVAL, - /* must be the last entry */ - __XE_GT_STATS_NUM_IDS, -}; - #ifdef CONFIG_DEBUG_FS int xe_gt_stats_print_info(struct xe_gt *gt, struct drm_printer *p); void xe_gt_stats_incr(struct xe_gt *gt, const enum xe_gt_stats_id id, int incr); diff --git a/drivers/gpu/drm/xe/xe_gt_stats_types.h b/drivers/gpu/drm/xe/xe_gt_stats_types.h new file mode 100644 index 000000000000..2fc055e39f27 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_gt_stats_types.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef _XE_GT_STATS_TYPES_H_ +#define _XE_GT_STATS_TYPES_H_ + +enum xe_gt_stats_id { + XE_GT_STATS_ID_TLB_INVAL, + /* must be the last entry */ + __XE_GT_STATS_NUM_IDS, +}; + +#endif diff --git a/drivers/gpu/drm/xe/xe_gt_throttle.c b/drivers/gpu/drm/xe/xe_gt_throttle.c index 03b225364101..8db78d616b6f 100644 --- a/drivers/gpu/drm/xe/xe_gt_throttle.c +++ b/drivers/gpu/drm/xe/xe_gt_throttle.c @@ -8,6 +8,7 @@ #include <regs/xe_gt_regs.h> #include "xe_device.h" #include "xe_gt.h" +#include "xe_gt_printk.h" #include "xe_gt_sysfs.h" #include "xe_gt_throttle.h" #include "xe_mmio.h" @@ -53,6 +54,7 @@ static u32 read_status(struct xe_gt *gt) { u32 status = xe_gt_throttle_get_limit_reasons(gt) & GT0_PERF_LIMIT_REASONS_MASK; + xe_gt_dbg(gt, "throttle reasons: 0x%08x\n", status); return status; } diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c index 6146d1776bda..665927b80e9e 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c @@ -261,9 +261,18 @@ static int xe_gt_tlb_invalidation_guc(struct xe_gt *gt, 0, /* seqno, replaced in send_tlb_invalidation */ MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC), }; + int ret; + + ret = send_tlb_invalidation(>->uc.guc, fence, action, + ARRAY_SIZE(action)); + /* + * -ECANCELED indicates the CT is stopped for a GT reset. TLB caches + * should be nuked on a GT reset so this error can be ignored. + */ + if (ret == -ECANCELED) + return 0; - return send_tlb_invalidation(>->uc.guc, fence, action, - ARRAY_SIZE(action)); + return ret; } /** diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h index a287b98ee70b..6e66bf0e8b3f 100644 --- a/drivers/gpu/drm/xe/xe_gt_types.h +++ b/drivers/gpu/drm/xe/xe_gt_types.h @@ -11,10 +11,10 @@ #include "xe_gt_idle_types.h" #include "xe_gt_sriov_pf_types.h" #include "xe_gt_sriov_vf_types.h" -#include "xe_gt_stats.h" +#include "xe_gt_stats_types.h" #include "xe_hw_engine_types.h" #include "xe_hw_fence_types.h" -#include "xe_oa.h" +#include "xe_oa_types.h" #include "xe_reg_sr_types.h" #include "xe_sa_types.h" #include "xe_uc_types.h" diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 7f704346a8f4..4e2868efb620 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -44,7 +44,15 @@ static u32 guc_bo_ggtt_addr(struct xe_guc *guc, struct xe_bo *bo) { struct xe_device *xe = guc_to_xe(guc); - u32 addr = xe_bo_ggtt_addr(bo); + u32 addr; + + /* + * For most BOs, the address on the allocating tile is fine. However for + * some, e.g. G2G CTB, the address on a specific tile is required as it + * might be different for each tile. So, just always ask for the address + * on the target GuC. + */ + addr = __xe_bo_ggtt_addr(bo, gt_to_tile(guc_to_gt(guc))->id); /* GuC addresses above GUC_GGTT_TOP don't map through the GTT */ xe_assert(xe, addr >= xe_wopcm_size(guc_to_xe(guc))); @@ -244,6 +252,293 @@ static void guc_write_params(struct xe_guc *guc) xe_mmio_write32(>->mmio, SOFT_SCRATCH(1 + i), guc->params[i]); } +static int guc_action_register_g2g_buffer(struct xe_guc *guc, u32 type, u32 dst_tile, u32 dst_dev, + u32 desc_addr, u32 buff_addr, u32 size) +{ + struct xe_gt *gt = guc_to_gt(guc); + struct xe_device *xe = gt_to_xe(gt); + u32 action[] = { + XE_GUC_ACTION_REGISTER_G2G, + FIELD_PREP(XE_G2G_REGISTER_SIZE, size / SZ_4K - 1) | + FIELD_PREP(XE_G2G_REGISTER_TYPE, type) | + FIELD_PREP(XE_G2G_REGISTER_TILE, dst_tile) | + FIELD_PREP(XE_G2G_REGISTER_DEVICE, dst_dev), + desc_addr, + buff_addr, + }; + + xe_assert(xe, (type == XE_G2G_TYPE_IN) || (type == XE_G2G_TYPE_OUT)); + xe_assert(xe, !(size % SZ_4K)); + + return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action)); +} + +static int guc_action_deregister_g2g_buffer(struct xe_guc *guc, u32 type, u32 dst_tile, u32 dst_dev) +{ + struct xe_gt *gt = guc_to_gt(guc); + struct xe_device *xe = gt_to_xe(gt); + u32 action[] = { + XE_GUC_ACTION_DEREGISTER_G2G, + FIELD_PREP(XE_G2G_DEREGISTER_TYPE, type) | + FIELD_PREP(XE_G2G_DEREGISTER_TILE, dst_tile) | + FIELD_PREP(XE_G2G_DEREGISTER_DEVICE, dst_dev), + }; + + xe_assert(xe, (type == XE_G2G_TYPE_IN) || (type == XE_G2G_TYPE_OUT)); + + return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action)); +} + +#define G2G_DEV(gt) (((gt)->info.type == XE_GT_TYPE_MAIN) ? 0 : 1) + +#define G2G_BUFFER_SIZE (SZ_4K) +#define G2G_DESC_SIZE (64) +#define G2G_DESC_AREA_SIZE (SZ_4K) + +/* + * Generate a unique id for each bi-directional CTB for each pair of + * near and far tiles/devices. The id can then be used as an index into + * a single allocation that is sub-divided into multiple CTBs. + * + * For example, with two devices per tile and two tiles, the table should + * look like: + * Far <tile>.<dev> + * 0.0 0.1 1.0 1.1 + * N 0.0 --/-- 00/01 02/03 04/05 + * e 0.1 01/00 --/-- 06/07 08/09 + * a 1.0 03/02 07/06 --/-- 10/11 + * r 1.1 05/04 09/08 11/10 --/-- + * + * Where each entry is Rx/Tx channel id. + * + * So GuC #3 (tile 1, dev 1) talking to GuC #2 (tile 1, dev 0) would + * be reading from channel #11 and writing to channel #10. Whereas, + * GuC #2 talking to GuC #3 would be read on #10 and write to #11. + */ +static unsigned int g2g_slot(u32 near_tile, u32 near_dev, u32 far_tile, u32 far_dev, + u32 type, u32 max_inst, bool have_dev) +{ + u32 near = near_tile, far = far_tile; + u32 idx = 0, x, y, direction; + int i; + + if (have_dev) { + near = (near << 1) | near_dev; + far = (far << 1) | far_dev; + } + + /* No need to send to one's self */ + if (far == near) + return -1; + + if (far > near) { + /* Top right table half */ + x = far; + y = near; + + /* T/R is 'forwards' direction */ + direction = type; + } else { + /* Bottom left table half */ + x = near; + y = far; + + /* B/L is 'backwards' direction */ + direction = (1 - type); + } + + /* Count the rows prior to the target */ + for (i = y; i > 0; i--) + idx += max_inst - i; + + /* Count this row up to the target */ + idx += (x - 1 - y); + + /* Slots are in Rx/Tx pairs */ + idx *= 2; + + /* Pick Rx/Tx direction */ + idx += direction; + + return idx; +} + +static int guc_g2g_register(struct xe_guc *near_guc, struct xe_gt *far_gt, u32 type, bool have_dev) +{ + struct xe_gt *near_gt = guc_to_gt(near_guc); + struct xe_device *xe = gt_to_xe(near_gt); + struct xe_bo *g2g_bo; + u32 near_tile = gt_to_tile(near_gt)->id; + u32 near_dev = G2G_DEV(near_gt); + u32 far_tile = gt_to_tile(far_gt)->id; + u32 far_dev = G2G_DEV(far_gt); + u32 max = xe->info.gt_count; + u32 base, desc, buf; + int slot; + + /* G2G is not allowed between different cards */ + xe_assert(xe, xe == gt_to_xe(far_gt)); + + g2g_bo = near_guc->g2g.bo; + xe_assert(xe, g2g_bo); + + slot = g2g_slot(near_tile, near_dev, far_tile, far_dev, type, max, have_dev); + xe_assert(xe, slot >= 0); + + base = guc_bo_ggtt_addr(near_guc, g2g_bo); + desc = base + slot * G2G_DESC_SIZE; + buf = base + G2G_DESC_AREA_SIZE + slot * G2G_BUFFER_SIZE; + + xe_assert(xe, (desc - base + G2G_DESC_SIZE) <= G2G_DESC_AREA_SIZE); + xe_assert(xe, (buf - base + G2G_BUFFER_SIZE) <= g2g_bo->size); + + return guc_action_register_g2g_buffer(near_guc, type, far_tile, far_dev, + desc, buf, G2G_BUFFER_SIZE); +} + +static void guc_g2g_deregister(struct xe_guc *guc, u32 far_tile, u32 far_dev, u32 type) +{ + guc_action_deregister_g2g_buffer(guc, type, far_tile, far_dev); +} + +static u32 guc_g2g_size(struct xe_guc *guc) +{ + struct xe_gt *gt = guc_to_gt(guc); + struct xe_device *xe = gt_to_xe(gt); + unsigned int count = xe->info.gt_count; + u32 num_channels = (count * (count - 1)) / 2; + + xe_assert(xe, num_channels * XE_G2G_TYPE_LIMIT * G2G_DESC_SIZE <= G2G_DESC_AREA_SIZE); + + return num_channels * XE_G2G_TYPE_LIMIT * G2G_BUFFER_SIZE + G2G_DESC_AREA_SIZE; +} + +static bool xe_guc_g2g_wanted(struct xe_device *xe) +{ + /* Can't do GuC to GuC communication if there is only one GuC */ + if (xe->info.gt_count <= 1) + return false; + + /* No current user */ + return false; +} + +static int guc_g2g_alloc(struct xe_guc *guc) +{ + struct xe_gt *gt = guc_to_gt(guc); + struct xe_device *xe = gt_to_xe(gt); + struct xe_tile *tile = gt_to_tile(gt); + struct xe_bo *bo; + u32 g2g_size; + + if (guc->g2g.bo) + return 0; + + if (gt->info.id != 0) { + struct xe_gt *root_gt = xe_device_get_gt(xe, 0); + struct xe_guc *root_guc = &root_gt->uc.guc; + struct xe_bo *bo; + + bo = xe_bo_get(root_guc->g2g.bo); + if (!bo) + return -ENODEV; + + guc->g2g.bo = bo; + guc->g2g.owned = false; + return 0; + } + + g2g_size = guc_g2g_size(guc); + bo = xe_managed_bo_create_pin_map(xe, tile, g2g_size, + XE_BO_FLAG_VRAM_IF_DGFX(tile) | + XE_BO_FLAG_GGTT | + XE_BO_FLAG_GGTT_ALL | + XE_BO_FLAG_GGTT_INVALIDATE); + if (IS_ERR(bo)) + return PTR_ERR(bo); + + xe_map_memset(xe, &bo->vmap, 0, 0, g2g_size); + guc->g2g.bo = bo; + guc->g2g.owned = true; + + return 0; +} + +static void guc_g2g_fini(struct xe_guc *guc) +{ + if (!guc->g2g.bo) + return; + + /* Unpinning the owned object is handled by generic shutdown */ + if (!guc->g2g.owned) + xe_bo_put(guc->g2g.bo); + + guc->g2g.bo = NULL; +} + +static int guc_g2g_start(struct xe_guc *guc) +{ + struct xe_gt *far_gt, *gt = guc_to_gt(guc); + struct xe_device *xe = gt_to_xe(gt); + unsigned int i, j; + int t, err; + bool have_dev; + + if (!guc->g2g.bo) { + int ret; + + ret = guc_g2g_alloc(guc); + if (ret) + return ret; + } + + /* GuC interface will need extending if more GT device types are ever created. */ + xe_gt_assert(gt, (gt->info.type == XE_GT_TYPE_MAIN) || (gt->info.type == XE_GT_TYPE_MEDIA)); + + /* Channel numbering depends on whether there are multiple GTs per tile */ + have_dev = xe->info.gt_count > xe->info.tile_count; + + for_each_gt(far_gt, xe, i) { + u32 far_tile, far_dev; + + if (far_gt->info.id == gt->info.id) + continue; + + far_tile = gt_to_tile(far_gt)->id; + far_dev = G2G_DEV(far_gt); + + for (t = 0; t < XE_G2G_TYPE_LIMIT; t++) { + err = guc_g2g_register(guc, far_gt, t, have_dev); + if (err) { + while (--t >= 0) + guc_g2g_deregister(guc, far_tile, far_dev, t); + goto err_deregister; + } + } + } + + return 0; + +err_deregister: + for_each_gt(far_gt, xe, j) { + u32 tile, dev; + + if (far_gt->info.id == gt->info.id) + continue; + + if (j >= i) + break; + + tile = gt_to_tile(far_gt)->id; + dev = G2G_DEV(far_gt); + + for (t = 0; t < XE_G2G_TYPE_LIMIT; t++) + guc_g2g_deregister(guc, tile, dev, t); + } + + return err; +} + static void guc_fini_hw(void *arg) { struct xe_guc *guc = arg; @@ -253,6 +548,8 @@ static void guc_fini_hw(void *arg) fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); xe_uc_fini_hw(&guc_to_gt(guc)->uc); xe_force_wake_put(gt_to_fw(gt), fw_ref); + + guc_g2g_fini(guc); } /** @@ -423,7 +720,16 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc) int xe_guc_post_load_init(struct xe_guc *guc) { + int ret; + xe_guc_ads_populate_post_load(&guc->ads); + + if (xe_guc_g2g_wanted(guc_to_xe(guc))) { + ret = guc_g2g_start(guc); + if (ret) + return ret; + } + guc->submission_state.enabled = true; return 0; @@ -945,7 +1251,6 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request, BUILD_BUG_ON(VF_SW_FLAG_COUNT != MED_VF_SW_FLAG_COUNT); - xe_assert(xe, !xe_guc_ct_enabled(&guc->ct)); xe_assert(xe, len); xe_assert(xe, len <= VF_SW_FLAG_COUNT); xe_assert(xe, len <= MED_VF_SW_FLAG_COUNT); @@ -1099,10 +1404,21 @@ int xe_guc_self_cfg64(struct xe_guc *guc, u16 key, u64 val) return guc_self_cfg(guc, key, 2, val); } +static void xe_guc_sw_0_irq_handler(struct xe_guc *guc) +{ + struct xe_gt *gt = guc_to_gt(guc); + + if (IS_SRIOV_VF(gt_to_xe(gt))) + xe_gt_sriov_vf_migrated_event_handler(gt); +} + void xe_guc_irq_handler(struct xe_guc *guc, const u16 iir) { if (iir & GUC_INTR_GUC2HOST) xe_guc_ct_irq_handler(&guc->ct); + + if (iir & GUC_INTR_SW_INT_0) + xe_guc_sw_0_irq_handler(guc); } void xe_guc_sanitize(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index 4e746ae98888..943146e5b460 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -231,11 +231,6 @@ static size_t guc_ads_size(struct xe_guc_ads *ads) guc_ads_private_data_size(ads); } -static bool needs_wa_1607983814(struct xe_device *xe) -{ - return GRAPHICS_VERx100(xe) < 1250; -} - static size_t calculate_regset_size(struct xe_gt *gt) { struct xe_reg_sr_entry *sr_entry; @@ -250,7 +245,7 @@ static size_t calculate_regset_size(struct xe_gt *gt) count += ADS_REGSET_EXTRA_MAX * XE_NUM_HW_ENGINES; - if (needs_wa_1607983814(gt_to_xe(gt))) + if (XE_WA(gt, 1607983814)) count += LNCFCMOCS_REG_COUNT; return count * sizeof(struct guc_mmio_reg); @@ -709,7 +704,6 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads, struct iosys_map *regset_map, struct xe_hw_engine *hwe) { - struct xe_device *xe = ads_to_xe(ads); struct xe_hw_engine *hwe_rcs_reset_domain = xe_gt_any_hw_engine_by_reset_domain(hwe->gt, XE_ENGINE_CLASS_RENDER); struct xe_reg_sr_entry *entry; @@ -740,8 +734,7 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads, guc_mmio_regset_write_one(ads, regset_map, e->reg, count++); } - /* Wa_1607983814 */ - if (needs_wa_1607983814(xe) && hwe->class == XE_ENGINE_CLASS_RENDER) { + if (XE_WA(hwe->gt, 1607983814) && hwe->class == XE_ENGINE_CLASS_RENDER) { for (i = 0; i < LNCFCMOCS_REG_COUNT; i++) { guc_mmio_regset_write_one(ads, regset_map, XELP_LNCFCMOCS(i), count++); diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c index d63912d28246..137571fae4ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.c +++ b/drivers/gpu/drm/xe/xe_guc_capture.c @@ -1806,7 +1806,6 @@ void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm if (!devcore_snapshot->matched_node) return; - xe_gt_assert(gt, snapshot->source <= XE_ENGINE_CAPTURE_SOURCE_GUC); xe_gt_assert(gt, snapshot->hwe); capture_class = xe_engine_class_to_guc_capture_class(snapshot->hwe->class); @@ -1815,7 +1814,8 @@ void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm snapshot->name ? snapshot->name : "", snapshot->logical_instance); drm_printf(p, "\tCapture_source: %s\n", - snapshot->source == XE_ENGINE_CAPTURE_SOURCE_GUC ? "GuC" : "Manual"); + devcore_snapshot->matched_node->source == XE_ENGINE_CAPTURE_SOURCE_GUC ? + "GuC" : "Manual"); drm_printf(p, "\tCoverage: %s\n", grptype[devcore_snapshot->matched_node->is_partial]); drm_printf(p, "\tForcewake: domain 0x%x, ref %d\n", snapshot->forcewake.domain, snapshot->forcewake.ref); @@ -1840,29 +1840,24 @@ void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm } /** - * xe_guc_capture_get_matching_and_lock - Matching GuC capture for the job. - * @job: The job object. + * xe_guc_capture_get_matching_and_lock - Matching GuC capture for the queue. + * @q: The exec queue object * - * Search within the capture outlist for the job, could be used for check if - * GuC capture is ready for the job. + * Search within the capture outlist for the queue, could be used for check if + * GuC capture is ready for the queue. * If found, the locked boolean of the node will be flagged. * * Returns: found guc-capture node ptr else NULL */ struct __guc_capture_parsed_output * -xe_guc_capture_get_matching_and_lock(struct xe_sched_job *job) +xe_guc_capture_get_matching_and_lock(struct xe_exec_queue *q) { struct xe_hw_engine *hwe; enum xe_hw_engine_id id; - struct xe_exec_queue *q; struct xe_device *xe; u16 guc_class = GUC_LAST_ENGINE_CLASS + 1; struct xe_devcoredump_snapshot *ss; - if (!job) - return NULL; - - q = job->q; if (!q || !q->gt) return NULL; @@ -1874,7 +1869,7 @@ xe_guc_capture_get_matching_and_lock(struct xe_sched_job *job) if (ss->matched_node && ss->matched_node->source == XE_ENGINE_CAPTURE_SOURCE_GUC) return ss->matched_node; - /* Find hwe for the job */ + /* Find hwe for the queue */ for_each_hw_engine(hwe, q->gt, id) { if (hwe != q->hwe) continue; @@ -1906,17 +1901,16 @@ xe_guc_capture_get_matching_and_lock(struct xe_sched_job *job) } /** - * xe_engine_snapshot_capture_for_job - Take snapshot of associated engine - * @job: The job object + * xe_engine_snapshot_capture_for_queue - Take snapshot of associated engine + * @q: The exec queue object * * Take snapshot of associated HW Engine * * Returns: None. */ void -xe_engine_snapshot_capture_for_job(struct xe_sched_job *job) +xe_engine_snapshot_capture_for_queue(struct xe_exec_queue *q) { - struct xe_exec_queue *q = job->q; struct xe_device *xe = gt_to_xe(q->gt); struct xe_devcoredump *coredump = &xe->devcoredump; struct xe_hw_engine *hwe; @@ -1934,11 +1928,12 @@ xe_engine_snapshot_capture_for_job(struct xe_sched_job *job) } if (!coredump->snapshot.hwe[id]) { - coredump->snapshot.hwe[id] = xe_hw_engine_snapshot_capture(hwe, job); + coredump->snapshot.hwe[id] = + xe_hw_engine_snapshot_capture(hwe, q); } else { struct __guc_capture_parsed_output *new; - new = xe_guc_capture_get_matching_and_lock(job); + new = xe_guc_capture_get_matching_and_lock(q); if (new) { struct xe_guc *guc = &q->gt->uc.guc; diff --git a/drivers/gpu/drm/xe/xe_guc_capture.h b/drivers/gpu/drm/xe/xe_guc_capture.h index 97a795d13dd1..20a078dc4b85 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.h +++ b/drivers/gpu/drm/xe/xe_guc_capture.h @@ -11,10 +11,10 @@ #include "xe_guc.h" #include "xe_guc_fwif.h" +struct xe_exec_queue; struct xe_guc; struct xe_hw_engine; struct xe_hw_engine_snapshot; -struct xe_sched_job; static inline enum guc_capture_list_class_type xe_guc_class_to_capture_class(u16 class) { @@ -50,10 +50,10 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc); const struct __guc_mmio_reg_descr_group * xe_guc_capture_get_reg_desc_list(struct xe_gt *gt, u32 owner, u32 type, enum guc_capture_list_class_type capture_class, bool is_ext); -struct __guc_capture_parsed_output *xe_guc_capture_get_matching_and_lock(struct xe_sched_job *job); +struct __guc_capture_parsed_output *xe_guc_capture_get_matching_and_lock(struct xe_exec_queue *q); void xe_engine_manual_capture(struct xe_hw_engine *hwe, struct xe_hw_engine_snapshot *snapshot); void xe_engine_snapshot_print(struct xe_hw_engine_snapshot *snapshot, struct drm_printer *p); -void xe_engine_snapshot_capture_for_job(struct xe_sched_job *job); +void xe_engine_snapshot_capture_for_queue(struct xe_exec_queue *q); void xe_guc_capture_steered_list_init(struct xe_guc *guc); void xe_guc_capture_put_matched_nodes(struct xe_guc *guc); int xe_guc_capture_init(struct xe_guc *guc); diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 8aeb1789805c..7d33f3a11e61 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -54,6 +54,7 @@ enum { CT_DEAD_PARSE_G2H_UNKNOWN, /* 0x1000 */ CT_DEAD_PARSE_G2H_ORIGIN, /* 0x2000 */ CT_DEAD_PARSE_G2H_TYPE, /* 0x4000 */ + CT_DEAD_CRASH, /* 0x8000 */ }; static void ct_dead_worker_func(struct work_struct *w); @@ -469,8 +470,10 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct) * after any existing dead state has been dumped. */ spin_lock_irq(&ct->dead.lock); - if (ct->dead.reason) + if (ct->dead.reason) { ct->dead.reason |= (1 << CT_DEAD_STATE_REARM); + queue_work(system_unbound_wq, &ct->dead.worker); + } spin_unlock_irq(&ct->dead.lock); #endif @@ -1017,7 +1020,6 @@ retry_same_fence: } ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ); - if (!ret) { LNL_FLUSH_WORK(&ct->g2h_worker); if (g2h_fence.done) { @@ -1121,6 +1123,24 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len) return 0; } +static int guc_crash_process_msg(struct xe_guc_ct *ct, u32 action) +{ + struct xe_gt *gt = ct_to_gt(ct); + + if (action == XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED) + xe_gt_err(gt, "GuC Crash dump notification\n"); + else if (action == XE_GUC_ACTION_NOTIFY_EXCEPTION) + xe_gt_err(gt, "GuC Exception notification\n"); + else + xe_gt_err(gt, "Unknown GuC crash notification: 0x%04X\n", action); + + CT_DEAD(ct, NULL, CRASH); + + kick_reset(ct); + + return 0; +} + static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) { struct xe_gt *gt = ct_to_gt(ct); @@ -1295,13 +1315,17 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len) case GUC_ACTION_GUC2PF_ADVERSE_EVENT: ret = xe_gt_sriov_pf_monitor_process_guc2pf(gt, hxg, hxg_len); break; + case XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED: + case XE_GUC_ACTION_NOTIFY_EXCEPTION: + ret = guc_crash_process_msg(ct, action); + break; default: xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action); } if (ret) { - xe_gt_err(gt, "G2H action 0x%04x failed (%pe)\n", - action, ERR_PTR(ret)); + xe_gt_err(gt, "G2H action %#04x failed (%pe) len %u msg %*ph\n", + action, ERR_PTR(ret), hxg_len, (int)sizeof(u32) * hxg_len, hxg); CT_DEAD(ct, NULL, PROCESS_FAILED); } diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h index 08ffe59f22fa..057153f89b30 100644 --- a/drivers/gpu/drm/xe/xe_guc_fwif.h +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h @@ -17,6 +17,7 @@ #define G2H_LEN_DW_TLB_INVALIDATE 3 #define GUC_ID_MAX 65535 +#define GUC_ID_UNKNOWN 0xffffffff #define GUC_CONTEXT_DISABLE 0 #define GUC_CONTEXT_ENABLE 1 diff --git a/drivers/gpu/drm/xe/xe_guc_klv_helpers.c b/drivers/gpu/drm/xe/xe_guc_klv_helpers.c index 9d99fe266d97..146a6eda9e06 100644 --- a/drivers/gpu/drm/xe/xe_guc_klv_helpers.c +++ b/drivers/gpu/drm/xe/xe_guc_klv_helpers.c @@ -49,6 +49,8 @@ const char *xe_guc_klv_key_to_string(u16 key) return "begin_db_id"; case GUC_KLV_VF_CFG_BEGIN_CONTEXT_ID_KEY: return "begin_ctx_id"; + case GUC_KLV_VF_CFG_SCHED_PRIORITY_KEY: + return "sched_priority"; /* VF CFG threshold keys */ #define define_threshold_key_to_string_case(TAG, NAME, ...) \ diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index 6f4a9812b4f4..9c36329fe857 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -412,12 +412,11 @@ static const int xe_exec_queue_prio_to_guc[] = { static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q) { struct exec_queue_policy policy; - struct xe_device *xe = guc_to_xe(guc); enum xe_exec_queue_priority prio = q->sched_props.priority; u32 timeslice_us = q->sched_props.timeslice_us; u32 preempt_timeout_us = q->sched_props.preempt_timeout_us; - xe_assert(xe, exec_queue_registered(q)); + xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); __guc_exec_queue_policy_start_klv(&policy, q->guc->id); __guc_exec_queue_policy_add_priority(&policy, xe_exec_queue_prio_to_guc[prio]); @@ -451,12 +450,11 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc, struct guc_ctxt_registration_info *info) { #define MAX_MLRC_REG_SIZE (13 + XE_HW_ENGINE_MAX_INSTANCE * 2) - struct xe_device *xe = guc_to_xe(guc); u32 action[MAX_MLRC_REG_SIZE]; int len = 0; int i; - xe_assert(xe, xe_exec_queue_is_parallel(q)); + xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_parallel(q)); action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC; action[len++] = info->flags; @@ -479,7 +477,7 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc, action[len++] = upper_32_bits(xe_lrc_descriptor(lrc)); } - xe_assert(xe, len <= MAX_MLRC_REG_SIZE); + xe_gt_assert(guc_to_gt(guc), len <= MAX_MLRC_REG_SIZE); #undef MAX_MLRC_REG_SIZE xe_guc_ct_send(&guc->ct, action, len, 0, 0); @@ -513,7 +511,7 @@ static void register_exec_queue(struct xe_exec_queue *q) struct xe_lrc *lrc = q->lrc[0]; struct guc_ctxt_registration_info info; - xe_assert(xe, !exec_queue_registered(q)); + xe_gt_assert(guc_to_gt(guc), !exec_queue_registered(q)); memset(&info, 0, sizeof(info)); info.context_idx = q->guc->id; @@ -603,7 +601,7 @@ static int wq_noop_append(struct xe_exec_queue *q) if (wq_wait_for_space(q, wq_space_until_wrap(q))) return -ENODEV; - xe_assert(xe, FIELD_FIT(WQ_LEN_MASK, len_dw)); + xe_gt_assert(guc_to_gt(guc), FIELD_FIT(WQ_LEN_MASK, len_dw)); parallel_write(xe, map, wq[q->guc->wqi_tail / sizeof(u32)], FIELD_PREP(WQ_TYPE_MASK, WQ_TYPE_NOOP) | @@ -643,13 +641,13 @@ static void wq_item_append(struct xe_exec_queue *q) wqi[i++] = lrc->ring.tail / sizeof(u64); } - xe_assert(xe, i == wqi_size / sizeof(u32)); + xe_gt_assert(guc_to_gt(guc), i == wqi_size / sizeof(u32)); iosys_map_incr(&map, offsetof(struct guc_submit_parallel_scratch, wq[q->guc->wqi_tail / sizeof(u32)])); xe_map_memcpy_to(xe, &map, 0, wqi, wqi_size); q->guc->wqi_tail += wqi_size; - xe_assert(xe, q->guc->wqi_tail <= WQ_SIZE); + xe_gt_assert(guc_to_gt(guc), q->guc->wqi_tail <= WQ_SIZE); xe_device_wmb(xe); @@ -661,7 +659,6 @@ static void wq_item_append(struct xe_exec_queue *q) static void submit_exec_queue(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); struct xe_lrc *lrc = q->lrc[0]; u32 action[3]; u32 g2h_len = 0; @@ -669,7 +666,7 @@ static void submit_exec_queue(struct xe_exec_queue *q) int len = 0; bool extra_submit = false; - xe_assert(xe, exec_queue_registered(q)); + xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); if (xe_exec_queue_is_parallel(q)) wq_item_append(q); @@ -716,12 +713,11 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job) struct xe_sched_job *job = to_xe_sched_job(drm_job); struct xe_exec_queue *q = job->q; struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); struct dma_fence *fence = NULL; bool lr = xe_exec_queue_is_lr(q); - xe_assert(xe, !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) || - exec_queue_banned(q) || exec_queue_suspended(q)); + xe_gt_assert(guc_to_gt(guc), !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) || + exec_queue_banned(q) || exec_queue_suspended(q)); trace_xe_sched_job_run(job); @@ -823,7 +819,7 @@ static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q) */ void xe_guc_submit_wedge(struct xe_guc *guc) { - struct xe_device *xe = guc_to_xe(guc); + struct xe_gt *gt = guc_to_gt(guc); struct xe_exec_queue *q; unsigned long index; int err; @@ -833,7 +829,8 @@ void xe_guc_submit_wedge(struct xe_guc *guc) err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev, guc_submit_wedged_fini, guc); if (err) { - drm_err(&xe->drm, "Failed to register xe_guc_submit clean-up on wedged.mode=2. Although device is wedged.\n"); + xe_gt_err(gt, "Failed to register clean-up on wedged.mode=2; " + "Although device is wedged.\n"); return; } @@ -865,11 +862,10 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w) container_of(w, struct xe_guc_exec_queue, lr_tdr); struct xe_exec_queue *q = ge->q; struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); struct xe_gpu_scheduler *sched = &ge->sched; bool wedged; - xe_assert(xe, xe_exec_queue_is_lr(q)); + xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_lr(q)); trace_xe_exec_queue_lr_cleanup(q); wedged = guc_submit_hint_wedged(exec_queue_to_guc(q)); @@ -903,13 +899,19 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w) !exec_queue_pending_disable(q) || xe_guc_read_stopped(guc), HZ * 5); if (!ret) { - drm_warn(&xe->drm, "Schedule disable failed to respond"); + xe_gt_warn(q->gt, "Schedule disable failed to respond, guc_id=%d\n", + q->guc->id); + xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n", + q->guc->id); xe_sched_submission_start(sched); xe_gt_reset_async(q->gt); return; } } + if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0])) + xe_devcoredump(q, NULL, "LR job cleanup, guc_id=%d", q->guc->id); + xe_sched_submission_start(sched); } @@ -1068,13 +1070,13 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) * do manual capture first and decide later if we need to use it */ if (!exec_queue_killed(q) && !xe->devcoredump.captured && - !xe_guc_capture_get_matching_and_lock(job)) { + !xe_guc_capture_get_matching_and_lock(q)) { /* take force wake before engine register manual capture */ fw_ref = xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL); if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) xe_gt_info(q->gt, "failed to get forcewake for coredump capture\n"); - xe_engine_snapshot_capture_for_job(job); + xe_engine_snapshot_capture_for_queue(q); xe_force_wake_put(gt_to_fw(q->gt), fw_ref); } @@ -1132,7 +1134,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) if (!ret || xe_guc_read_stopped(guc)) { trigger_reset: if (!ret) - xe_gt_warn(guc_to_gt(guc), "Schedule disable failed to respond"); + xe_gt_warn(guc_to_gt(guc), + "Schedule disable failed to respond, guc_id=%d", + q->guc->id); + xe_devcoredump(q, job, + "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d", + q->guc->id, ret, xe_guc_read_stopped(guc)); set_exec_queue_extra_ref(q); xe_exec_queue_get(q); /* GT reset owns this */ set_exec_queue_banned(q); @@ -1162,7 +1169,10 @@ trigger_reset: trace_xe_sched_job_timedout(job); if (!exec_queue_killed(q)) - xe_devcoredump(job); + xe_devcoredump(q, job, + "Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx", + xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job), + q->guc->id, q->flags); /* * Kernel jobs should never fail, nor should VM jobs if they do @@ -1277,9 +1287,8 @@ static void __guc_exec_queue_process_msg_cleanup(struct xe_sched_msg *msg) { struct xe_exec_queue *q = msg->private_data; struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); - xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_PERMANENT)); + xe_gt_assert(guc_to_gt(guc), !(q->flags & EXEC_QUEUE_FLAG_PERMANENT)); trace_xe_exec_queue_cleanup_entity(q); if (exec_queue_registered(q)) @@ -1315,11 +1324,10 @@ static void __suspend_fence_signal(struct xe_exec_queue *q) static void suspend_fence_signal(struct xe_exec_queue *q) { struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); - xe_assert(xe, exec_queue_suspended(q) || exec_queue_killed(q) || - xe_guc_read_stopped(guc)); - xe_assert(xe, q->guc->suspend_pending); + xe_gt_assert(guc_to_gt(guc), exec_queue_suspended(q) || exec_queue_killed(q) || + xe_guc_read_stopped(guc)); + xe_gt_assert(guc_to_gt(guc), q->guc->suspend_pending); __suspend_fence_signal(q); } @@ -1415,12 +1423,11 @@ static int guc_exec_queue_init(struct xe_exec_queue *q) { struct xe_gpu_scheduler *sched; struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); struct xe_guc_exec_queue *ge; long timeout; int err, i; - xe_assert(xe, xe_device_uc_enabled(guc_to_xe(guc))); + xe_gt_assert(guc_to_gt(guc), xe_device_uc_enabled(guc_to_xe(guc))); ge = kzalloc(sizeof(*ge), GFP_KERNEL); if (!ge) @@ -1633,9 +1640,8 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q) struct xe_gpu_scheduler *sched = &q->guc->sched; struct xe_sched_msg *msg = q->guc->static_msgs + STATIC_MSG_RESUME; struct xe_guc *guc = exec_queue_to_guc(q); - struct xe_device *xe = guc_to_xe(guc); - xe_assert(xe, !q->guc->suspend_pending); + xe_gt_assert(guc_to_gt(guc), !q->guc->suspend_pending); xe_sched_msg_lock(sched); guc_exec_queue_try_add_msg(q, msg, RESUME); @@ -1708,7 +1714,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q) ban = true; } } else if (xe_exec_queue_is_lr(q) && - (xe_lrc_ring_head(q->lrc[0]) != xe_lrc_ring_tail(q->lrc[0]))) { + !xe_lrc_ring_is_idle(q->lrc[0])) { ban = true; } @@ -1747,9 +1753,8 @@ void xe_guc_submit_stop(struct xe_guc *guc) { struct xe_exec_queue *q; unsigned long index; - struct xe_device *xe = guc_to_xe(guc); - xe_assert(xe, xe_guc_read_stopped(guc) == 1); + xe_gt_assert(guc_to_gt(guc), xe_guc_read_stopped(guc) == 1); mutex_lock(&guc->submission_state.lock); @@ -1791,9 +1796,8 @@ int xe_guc_submit_start(struct xe_guc *guc) { struct xe_exec_queue *q; unsigned long index; - struct xe_device *xe = guc_to_xe(guc); - xe_assert(xe, xe_guc_read_stopped(guc) == 1); + xe_gt_assert(guc_to_gt(guc), xe_guc_read_stopped(guc) == 1); mutex_lock(&guc->submission_state.lock); atomic_dec(&guc->submission_state.stopped); @@ -1814,22 +1818,22 @@ int xe_guc_submit_start(struct xe_guc *guc) static struct xe_exec_queue * g2h_exec_queue_lookup(struct xe_guc *guc, u32 guc_id) { - struct xe_device *xe = guc_to_xe(guc); + struct xe_gt *gt = guc_to_gt(guc); struct xe_exec_queue *q; if (unlikely(guc_id >= GUC_ID_MAX)) { - drm_err(&xe->drm, "Invalid guc_id %u", guc_id); + xe_gt_err(gt, "Invalid guc_id %u\n", guc_id); return NULL; } q = xa_load(&guc->submission_state.exec_queue_lookup, guc_id); if (unlikely(!q)) { - drm_err(&xe->drm, "Not engine present for guc_id %u", guc_id); + xe_gt_err(gt, "Not engine present for guc_id %u\n", guc_id); return NULL; } - xe_assert(xe, guc_id >= q->guc->id); - xe_assert(xe, guc_id < (q->guc->id + q->width)); + xe_gt_assert(guc_to_gt(guc), guc_id >= q->guc->id); + xe_gt_assert(guc_to_gt(guc), guc_id < (q->guc->id + q->width)); return q; } @@ -1898,15 +1902,14 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q, int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len) { - struct xe_device *xe = guc_to_xe(guc); struct xe_exec_queue *q; - u32 guc_id = msg[0]; - u32 runnable_state = msg[1]; + u32 guc_id, runnable_state; - if (unlikely(len < 2)) { - drm_err(&xe->drm, "Invalid length %u", len); + if (unlikely(len < 2)) return -EPROTO; - } + + guc_id = msg[0]; + runnable_state = msg[1]; q = g2h_exec_queue_lookup(guc, guc_id); if (unlikely(!q)) @@ -1940,14 +1943,13 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q) int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len) { - struct xe_device *xe = guc_to_xe(guc); struct xe_exec_queue *q; - u32 guc_id = msg[0]; + u32 guc_id; - if (unlikely(len < 1)) { - drm_err(&xe->drm, "Invalid length %u", len); + if (unlikely(len < 1)) return -EPROTO; - } + + guc_id = msg[0]; q = g2h_exec_queue_lookup(guc, guc_id); if (unlikely(!q)) @@ -1969,14 +1971,13 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len) int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len) { struct xe_gt *gt = guc_to_gt(guc); - struct xe_device *xe = guc_to_xe(guc); struct xe_exec_queue *q; - u32 guc_id = msg[0]; + u32 guc_id; - if (unlikely(len < 1)) { - drm_err(&xe->drm, "Invalid length %u", len); + if (unlikely(len < 1)) return -EPROTO; - } + + guc_id = msg[0]; q = g2h_exec_queue_lookup(guc, guc_id); if (unlikely(!q)) @@ -2016,10 +2017,8 @@ int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len) { u32 status; - if (unlikely(len != XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION_DATA_LEN)) { - xe_gt_dbg(guc_to_gt(guc), "Invalid length %u", len); + if (unlikely(len != XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION_DATA_LEN)) return -EPROTO; - } status = msg[0] & XE_GUC_STATE_CAPTURE_EVENT_STATUS_MASK; if (status == XE_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE) @@ -2034,13 +2033,21 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, u32 len) { struct xe_gt *gt = guc_to_gt(guc); - struct xe_device *xe = guc_to_xe(guc); struct xe_exec_queue *q; - u32 guc_id = msg[0]; + u32 guc_id; - if (unlikely(len < 1)) { - drm_err(&xe->drm, "Invalid length %u", len); + if (unlikely(len < 1)) return -EPROTO; + + guc_id = msg[0]; + + if (guc_id == GUC_ID_UNKNOWN) { + /* + * GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any PF/VF + * context. In such case only PF will be notified about that fault. + */ + xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n"); + return 0; } q = g2h_exec_queue_lookup(guc, guc_id); @@ -2062,24 +2069,22 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg, int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len) { - struct xe_device *xe = guc_to_xe(guc); + struct xe_gt *gt = guc_to_gt(guc); u8 guc_class, instance; u32 reason; - if (unlikely(len != 3)) { - drm_err(&xe->drm, "Invalid length %u", len); + if (unlikely(len != 3)) return -EPROTO; - } guc_class = msg[0]; instance = msg[1]; reason = msg[2]; /* Unexpected failure of a hardware feature, log an actual error */ - drm_err(&xe->drm, "GuC engine reset request failed on %d:%d because 0x%08X", - guc_class, instance, reason); + xe_gt_err(gt, "GuC engine reset request failed on %d:%d because 0x%08X", + guc_class, instance, reason); - xe_gt_reset_async(guc_to_gt(guc)); + xe_gt_reset_async(gt); return 0; } diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h index fa75f57bf5da..83a41ebcdc91 100644 --- a/drivers/gpu/drm/xe/xe_guc_types.h +++ b/drivers/gpu/drm/xe/xe_guc_types.h @@ -64,6 +64,15 @@ struct xe_guc { struct xe_guc_pc pc; /** @dbm: GuC Doorbell Manager */ struct xe_guc_db_mgr dbm; + + /** @g2g: GuC to GuC communication state */ + struct { + /** @g2g.bo: Storage for GuC to GuC communication channels */ + struct xe_bo *bo; + /** @g2g.owned: Is the BO owned by this GT or just mapped in */ + bool owned; + } g2g; + /** @submission_state: GuC submission state */ struct { /** @submission_state.idm: GuC context ID Manager */ @@ -79,6 +88,7 @@ struct xe_guc { /** @submission_state.fini_wq: submit fini wait queue */ wait_queue_head_t fini_wq; } submission_state; + /** @hwconfig: Hardware config state */ struct { /** @hwconfig.bo: buffer object of the hardware config */ diff --git a/drivers/gpu/drm/xe/xe_heci_gsc.c b/drivers/gpu/drm/xe/xe_heci_gsc.c index 65b2e147c4b9..d765bfd3636b 100644 --- a/drivers/gpu/drm/xe/xe_heci_gsc.c +++ b/drivers/gpu/drm/xe/xe_heci_gsc.c @@ -92,7 +92,7 @@ void xe_heci_gsc_fini(struct xe_device *xe) { struct xe_heci_gsc *heci_gsc = &xe->heci_gsc; - if (!HAS_HECI_GSCFI(xe) && !HAS_HECI_CSCFI(xe)) + if (!xe->info.has_heci_gscfi && !xe->info.has_heci_cscfi) return; if (heci_gsc->adev) { @@ -177,7 +177,7 @@ void xe_heci_gsc_init(struct xe_device *xe) const struct heci_gsc_def *def; int ret; - if (!HAS_HECI_GSCFI(xe) && !HAS_HECI_CSCFI(xe)) + if (!xe->info.has_heci_gscfi && !xe->info.has_heci_cscfi) return; heci_gsc->irq = -1; @@ -222,7 +222,7 @@ void xe_heci_gsc_irq_handler(struct xe_device *xe, u32 iir) if ((iir & GSC_IRQ_INTF(1)) == 0) return; - if (!HAS_HECI_GSCFI(xe)) { + if (!xe->info.has_heci_gscfi) { drm_warn_once(&xe->drm, "GSC irq: not supported"); return; } @@ -242,7 +242,7 @@ void xe_heci_csc_irq_handler(struct xe_device *xe, u32 iir) if ((iir & CSC_IRQ_INTF(1)) == 0) return; - if (!HAS_HECI_CSCFI(xe)) { + if (!xe->info.has_heci_cscfi) { drm_warn_once(&xe->drm, "CSC irq: not supported"); return; } diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c index 1557acee3523..b19366744148 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -574,7 +574,6 @@ static int hw_engine_init(struct xe_gt *gt, struct xe_hw_engine *hwe, xe_gt_assert(gt, gt->info.engine_mask & BIT(id)); xe_reg_sr_apply_mmio(&hwe->reg_sr, gt); - xe_reg_sr_apply_whitelist(hwe); hwe->hwsp = xe_managed_bo_create_pin_map(xe, tile, SZ_4K, XE_BO_FLAG_VRAM_IF_DGFX(tile) | @@ -829,7 +828,7 @@ void xe_hw_engine_handle_irq(struct xe_hw_engine *hwe, u16 intr_vec) /** * xe_hw_engine_snapshot_capture - Take a quick snapshot of the HW Engine. * @hwe: Xe HW Engine. - * @job: The job object. + * @q: The exec queue object. * * This can be printed out in a later stage like during dev_coredump * analysis. @@ -838,7 +837,7 @@ void xe_hw_engine_handle_irq(struct xe_hw_engine *hwe, u16 intr_vec) * caller, using `xe_hw_engine_snapshot_free`. */ struct xe_hw_engine_snapshot * -xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_sched_job *job) +xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_exec_queue *q) { struct xe_hw_engine_snapshot *snapshot; struct __guc_capture_parsed_output *node; @@ -864,15 +863,14 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_sched_job *job if (IS_SRIOV_VF(gt_to_xe(hwe->gt))) return snapshot; - if (job) { + if (q) { /* If got guc capture, set source to GuC */ - node = xe_guc_capture_get_matching_and_lock(job); + node = xe_guc_capture_get_matching_and_lock(q); if (node) { struct xe_device *xe = gt_to_xe(hwe->gt); struct xe_devcoredump *coredump = &xe->devcoredump; coredump->snapshot.matched_node = node; - snapshot->source = XE_ENGINE_CAPTURE_SOURCE_GUC; xe_gt_dbg(hwe->gt, "Found and locked GuC-err-capture node"); return snapshot; } @@ -880,7 +878,6 @@ xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_sched_job *job /* otherwise, do manual capture */ xe_engine_manual_capture(hwe, snapshot); - snapshot->source = XE_ENGINE_CAPTURE_SOURCE_MANUAL; xe_gt_dbg(hwe->gt, "Proceeding with manual engine snapshot"); return snapshot; diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h index da0a6922a26f..6b5f9fa2a594 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine.h +++ b/drivers/gpu/drm/xe/xe_hw_engine.h @@ -11,7 +11,7 @@ struct drm_printer; struct drm_xe_engine_class_instance; struct xe_device; -struct xe_sched_job; +struct xe_exec_queue; #ifdef CONFIG_DRM_XE_JOB_TIMEOUT_MIN #define XE_HW_ENGINE_JOB_TIMEOUT_MIN CONFIG_DRM_XE_JOB_TIMEOUT_MIN @@ -56,7 +56,7 @@ void xe_hw_engine_enable_ring(struct xe_hw_engine *hwe); u32 xe_hw_engine_mask_per_class(struct xe_gt *gt, enum xe_engine_class engine_class); struct xe_hw_engine_snapshot * -xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_sched_job *job); +xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe, struct xe_exec_queue *q); void xe_hw_engine_snapshot_free(struct xe_hw_engine_snapshot *snapshot); void xe_hw_engine_print(struct xe_hw_engine *hwe, struct drm_printer *p); void xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe); diff --git a/drivers/gpu/drm/xe/xe_hw_engine_types.h b/drivers/gpu/drm/xe/xe_hw_engine_types.h index 719f27ef00a5..e14bee95e364 100644 --- a/drivers/gpu/drm/xe/xe_hw_engine_types.h +++ b/drivers/gpu/drm/xe/xe_hw_engine_types.h @@ -165,8 +165,6 @@ enum xe_hw_engine_snapshot_source_id { struct xe_hw_engine_snapshot { /** @name: name of the hw engine */ char *name; - /** @source: Data source, either manual or GuC */ - enum xe_hw_engine_snapshot_source_id source; /** @hwe: hw engine */ struct xe_hw_engine *hwe; /** @logical_instance: logical instance of this hw engine */ diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c index b7995ebd54ab..1c509e66694d 100644 --- a/drivers/gpu/drm/xe/xe_irq.c +++ b/drivers/gpu/drm/xe/xe_irq.c @@ -192,7 +192,7 @@ void xe_irq_enable_hwe(struct xe_gt *gt) if (xe_hw_engine_mask_per_class(gt, XE_ENGINE_CLASS_OTHER)) { gsc_mask = irqs | GSC_ER_COMPLETE; heci_mask = GSC_IRQ_INTF(1); - } else if (HAS_HECI_GSCFI(xe)) { + } else if (xe->info.has_heci_gscfi) { gsc_mask = GSC_IRQ_INTF(1); } @@ -325,7 +325,7 @@ static void gt_irq_handler(struct xe_tile *tile, if (class == XE_ENGINE_CLASS_OTHER) { /* HECI GSCFI interrupts come from outside of GT */ - if (HAS_HECI_GSCFI(xe) && instance == OTHER_GSC_INSTANCE) + if (xe->info.has_heci_gscfi && instance == OTHER_GSC_INSTANCE) xe_heci_gsc_irq_handler(xe, intr_vec); else gt_other_irq_handler(engine_gt, instance, intr_vec); @@ -348,12 +348,8 @@ static irqreturn_t xelp_irq_handler(int irq, void *arg) unsigned long intr_dw[2]; u32 identity[32]; - spin_lock(&xe->irq.lock); - if (!xe->irq.enabled) { - spin_unlock(&xe->irq.lock); + if (!atomic_read(&xe->irq.enabled)) return IRQ_NONE; - } - spin_unlock(&xe->irq.lock); master_ctl = xelp_intr_disable(xe); if (!master_ctl) { @@ -417,12 +413,8 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg) /* TODO: This really shouldn't be copied+pasted */ - spin_lock(&xe->irq.lock); - if (!xe->irq.enabled) { - spin_unlock(&xe->irq.lock); + if (!atomic_read(&xe->irq.enabled)) return IRQ_NONE; - } - spin_unlock(&xe->irq.lock); master_tile_ctl = dg1_intr_disable(xe); if (!master_tile_ctl) { @@ -459,7 +451,7 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg) * the primary tile. */ if (id == 0) { - if (HAS_HECI_CSCFI(xe)) + if (xe->info.has_heci_cscfi) xe_heci_csc_irq_handler(xe, master_ctl); xe_display_irq_handler(xe, master_ctl); gu_misc_iir = gu_misc_irq_ack(xe, master_ctl); @@ -508,7 +500,7 @@ static void gt_irq_reset(struct xe_tile *tile) if ((tile->media_gt && xe_hw_engine_mask_per_class(tile->media_gt, XE_ENGINE_CLASS_OTHER)) || - HAS_HECI_GSCFI(tile_to_xe(tile))) { + tile_to_xe(tile)->info.has_heci_gscfi) { xe_mmio_write32(mmio, GUNIT_GSC_INTR_ENABLE, 0); xe_mmio_write32(mmio, GUNIT_GSC_INTR_MASK, ~0); xe_mmio_write32(mmio, HECI2_RSVD_INTR_MASK, ~0); @@ -644,12 +636,8 @@ static irqreturn_t vf_mem_irq_handler(int irq, void *arg) struct xe_tile *tile; unsigned int id; - spin_lock(&xe->irq.lock); - if (!xe->irq.enabled) { - spin_unlock(&xe->irq.lock); + if (!atomic_read(&xe->irq.enabled)) return IRQ_NONE; - } - spin_unlock(&xe->irq.lock); for_each_tile(tile, xe, id) xe_memirq_handler(&tile->memirq); @@ -674,10 +662,9 @@ static void irq_uninstall(void *arg) struct pci_dev *pdev = to_pci_dev(xe->drm.dev); int irq; - if (!xe->irq.enabled) + if (!atomic_xchg(&xe->irq.enabled, 0)) return; - xe->irq.enabled = false; xe_irq_reset(xe); irq = pci_irq_vector(pdev, 0); @@ -724,7 +711,7 @@ int xe_irq_install(struct xe_device *xe) return err; } - xe->irq.enabled = true; + atomic_set(&xe->irq.enabled, 1); xe_irq_postinstall(xe); @@ -744,9 +731,7 @@ void xe_irq_suspend(struct xe_device *xe) { int irq = to_pci_dev(xe->drm.dev)->irq; - spin_lock_irq(&xe->irq.lock); - xe->irq.enabled = false; /* no new irqs */ - spin_unlock_irq(&xe->irq.lock); + atomic_set(&xe->irq.enabled, 0); /* no new irqs */ synchronize_irq(irq); /* flush irqs */ xe_irq_reset(xe); /* turn irqs off */ @@ -762,7 +747,7 @@ void xe_irq_resume(struct xe_device *xe) * 1. no irq will arrive before the postinstall * 2. display is not yet resumed */ - xe->irq.enabled = true; + atomic_set(&xe->irq.enabled, 1); xe_irq_reset(xe); xe_irq_postinstall(xe); /* turn irqs on */ diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c index 4f64c7f4e68d..22e58c6e2a35 100644 --- a/drivers/gpu/drm/xe/xe_lrc.c +++ b/drivers/gpu/drm/xe/xe_lrc.c @@ -25,6 +25,7 @@ #include "xe_map.h" #include "xe_memirq.h" #include "xe_sriov.h" +#include "xe_trace_lrc.h" #include "xe_vm.h" #include "xe_wa.h" @@ -1060,6 +1061,14 @@ u32 xe_lrc_ring_tail(struct xe_lrc *lrc) return xe_lrc_read_ctx_reg(lrc, CTX_RING_TAIL) & TAIL_ADDR; } +static u32 xe_lrc_ring_start(struct xe_lrc *lrc) +{ + if (xe_lrc_has_indirect_ring_state(lrc)) + return xe_lrc_read_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START); + else + return xe_lrc_read_ctx_reg(lrc, CTX_RING_START); +} + void xe_lrc_set_ring_head(struct xe_lrc *lrc, u32 head) { if (xe_lrc_has_indirect_ring_state(lrc)) @@ -1635,10 +1644,12 @@ struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc) xe_vm_get(lrc->bo->vm); snapshot->context_desc = xe_lrc_ggtt_addr(lrc); + snapshot->ring_addr = __xe_lrc_ring_ggtt_addr(lrc); snapshot->indirect_context_desc = xe_lrc_indirect_ring_ggtt_addr(lrc); snapshot->head = xe_lrc_ring_head(lrc); snapshot->tail.internal = lrc->ring.tail; snapshot->tail.memory = xe_lrc_ring_tail(lrc); + snapshot->start = xe_lrc_ring_start(lrc); snapshot->start_seqno = xe_lrc_start_seqno(lrc); snapshot->seqno = xe_lrc_seqno(lrc); snapshot->lrc_bo = xe_bo_get(lrc->bo); @@ -1692,11 +1703,14 @@ void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer return; drm_printf(p, "\tHW Context Desc: 0x%08x\n", snapshot->context_desc); + drm_printf(p, "\tHW Ring address: 0x%08x\n", + snapshot->ring_addr); drm_printf(p, "\tHW Indirect Ring State: 0x%08x\n", snapshot->indirect_context_desc); drm_printf(p, "\tLRC Head: (memory) %u\n", snapshot->head); drm_printf(p, "\tLRC Tail: (internal) %u, (memory) %u\n", snapshot->tail.internal, snapshot->tail.memory); + drm_printf(p, "\tRing start: (memory) 0x%08x\n", snapshot->start); drm_printf(p, "\tStart seqno: (memory) %d\n", snapshot->start_seqno); drm_printf(p, "\tSeqno: (memory) %d\n", snapshot->seqno); drm_printf(p, "\tTimestamp: 0x%08x\n", snapshot->ctx_timestamp); @@ -1758,5 +1772,20 @@ u32 xe_lrc_update_timestamp(struct xe_lrc *lrc, u32 *old_ts) lrc->ctx_timestamp = xe_lrc_ctx_timestamp(lrc); + trace_xe_lrc_update_timestamp(lrc, *old_ts); + return lrc->ctx_timestamp; } + +/** + * xe_lrc_ring_is_idle() - LRC is idle + * @lrc: Pointer to the lrc. + * + * Compare LRC ring head and tail to determine if idle. + * + * Return: True is ring is idle, False otherwise + */ +bool xe_lrc_ring_is_idle(struct xe_lrc *lrc) +{ + return xe_lrc_ring_head(lrc) == xe_lrc_ring_tail(lrc); +} diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h index 40d8f6906d3e..b459dcab8787 100644 --- a/drivers/gpu/drm/xe/xe_lrc.h +++ b/drivers/gpu/drm/xe/xe_lrc.h @@ -25,8 +25,10 @@ struct xe_lrc_snapshot { unsigned long lrc_size, lrc_offset; u32 context_desc; + u32 ring_addr; u32 indirect_context_desc; u32 head; + u32 start; struct { u32 internal; u32 memory; @@ -78,6 +80,8 @@ u32 xe_lrc_ring_head(struct xe_lrc *lrc); u32 xe_lrc_ring_space(struct xe_lrc *lrc); void xe_lrc_write_ring(struct xe_lrc *lrc, const void *data, size_t size); +bool xe_lrc_ring_is_idle(struct xe_lrc *lrc); + u32 xe_lrc_indirect_ring_ggtt_addr(struct xe_lrc *lrc); u32 xe_lrc_ggtt_addr(struct xe_lrc *lrc); u32 *xe_lrc_regs(struct xe_lrc *lrc); diff --git a/drivers/gpu/drm/xe/xe_macros.h b/drivers/gpu/drm/xe/xe_macros.h index daf56c846d03..8a77c2423555 100644 --- a/drivers/gpu/drm/xe/xe_macros.h +++ b/drivers/gpu/drm/xe/xe_macros.h @@ -10,9 +10,13 @@ #define XE_WARN_ON WARN_ON -#define XE_IOCTL_DBG(xe, cond) \ - ((cond) && (drm_dbg(&(xe)->drm, \ - "Ioctl argument check failed at %s:%d: %s", \ - __FILE__, __LINE__, #cond), 1)) +#define XE_IOCTL_DBG(xe, cond) ({ \ + int cond__ = !!(cond); \ + if (cond__) \ + drm_dbg(&(xe)->drm, \ + "Ioctl argument check failed at %s:%d: %s", \ + __FILE__, __LINE__, #cond); \ + cond__; \ +}) #endif diff --git a/drivers/gpu/drm/xe/xe_memirq.c b/drivers/gpu/drm/xe/xe_memirq.c index f833da88150a..404fa2a456d5 100644 --- a/drivers/gpu/drm/xe/xe_memirq.c +++ b/drivers/gpu/drm/xe/xe_memirq.c @@ -155,13 +155,6 @@ static const char *guc_name(struct xe_guc *guc) * */ -static void __release_xe_bo(struct drm_device *drm, void *arg) -{ - struct xe_bo *bo = arg; - - xe_bo_unpin_map_no_vm(bo); -} - static inline bool hw_reports_to_instance_zero(struct xe_memirq *memirq) { /* @@ -184,14 +177,12 @@ static int memirq_alloc_pages(struct xe_memirq *memirq) BUILD_BUG_ON(!IS_ALIGNED(XE_MEMIRQ_SOURCE_OFFSET(0), SZ_64)); BUILD_BUG_ON(!IS_ALIGNED(XE_MEMIRQ_STATUS_OFFSET(0), SZ_4K)); - /* XXX: convert to managed bo */ - bo = xe_bo_create_pin_map(xe, tile, NULL, bo_size, - ttm_bo_type_kernel, - XE_BO_FLAG_SYSTEM | - XE_BO_FLAG_GGTT | - XE_BO_FLAG_GGTT_INVALIDATE | - XE_BO_FLAG_NEEDS_UC | - XE_BO_FLAG_NEEDS_CPU_ACCESS); + bo = xe_managed_bo_create_pin_map(xe, tile, bo_size, + XE_BO_FLAG_SYSTEM | + XE_BO_FLAG_GGTT | + XE_BO_FLAG_GGTT_INVALIDATE | + XE_BO_FLAG_NEEDS_UC | + XE_BO_FLAG_NEEDS_CPU_ACCESS); if (IS_ERR(bo)) { err = PTR_ERR(bo); goto out; @@ -215,7 +206,7 @@ static int memirq_alloc_pages(struct xe_memirq *memirq) xe_bo_ggtt_addr(bo), bo_size, XE_MEMIRQ_SOURCE_OFFSET(0), XE_MEMIRQ_STATUS_OFFSET(0)); - return drmm_add_action_or_reset(&xe->drm, __release_xe_bo, memirq->bo); + return 0; out: memirq_err(memirq, "Failed to allocate memirq page (%pe)\n", ERR_PTR(err)); @@ -442,6 +433,9 @@ static void memirq_dispatch_guc(struct xe_memirq *memirq, struct iosys_map *stat if (memirq_received(memirq, status, ilog2(GUC_INTR_GUC2HOST), name)) xe_guc_irq_handler(guc, GUC_INTR_GUC2HOST); + + if (memirq_received(memirq, status, ilog2(GUC_INTR_SW_INT_0), name)) + xe_guc_irq_handler(guc, GUC_INTR_SW_INT_0); } /** diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..07b27114be9a 100644 --- a/drivers/gpu/drm/xe/xe_module.c +++ b/drivers/gpu/drm/xe/xe_module.c @@ -19,7 +19,7 @@ struct xe_modparam xe_modparam = { .probe_display = true, - .guc_log_level = 5, + .guc_log_level = 3, .force_probe = CONFIG_DRM_XE_FORCE_PROBE, .wedged_mode = 1, /* the rest are 0 by default */ diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c index 8dd55798ab31..ec88b18e9baa 100644 --- a/drivers/gpu/drm/xe/xe_oa.c +++ b/drivers/gpu/drm/xe/xe_oa.c @@ -96,6 +96,7 @@ struct xe_oa_open_param { struct drm_xe_sync __user *syncs_user; int num_syncs; struct xe_sync_entry *syncs; + size_t oa_buffer_size; }; struct xe_oa_config_bo { @@ -403,11 +404,19 @@ static int xe_oa_append_reports(struct xe_oa_stream *stream, char __user *buf, static void xe_oa_init_oa_buffer(struct xe_oa_stream *stream) { - struct xe_mmio *mmio = &stream->gt->mmio; u32 gtt_offset = xe_bo_ggtt_addr(stream->oa_buffer.bo); - u32 oa_buf = gtt_offset | OABUFFER_SIZE_16M | OAG_OABUFFER_MEMORY_SELECT; + int size_exponent = __ffs(stream->oa_buffer.bo->size); + u32 oa_buf = gtt_offset | OAG_OABUFFER_MEMORY_SELECT; + struct xe_mmio *mmio = &stream->gt->mmio; unsigned long flags; + /* + * If oa buffer size is more than 16MB (exponent greater than 24), the + * oa buffer size field is multiplied by 8 in xe_oa_enable_metric_set. + */ + oa_buf |= REG_FIELD_PREP(OABUFFER_SIZE_MASK, + size_exponent > 24 ? size_exponent - 20 : size_exponent - 17); + spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); xe_mmio_write32(mmio, __oa_regs(stream)->oa_status, 0); @@ -901,15 +910,12 @@ static void xe_oa_stream_destroy(struct xe_oa_stream *stream) xe_file_put(stream->xef); } -static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream) +static int xe_oa_alloc_oa_buffer(struct xe_oa_stream *stream, size_t size) { struct xe_bo *bo; - BUILD_BUG_ON_NOT_POWER_OF_2(XE_OA_BUFFER_SIZE); - BUILD_BUG_ON(XE_OA_BUFFER_SIZE < SZ_128K || XE_OA_BUFFER_SIZE > SZ_16M); - bo = xe_bo_create_pin_map(stream->oa->xe, stream->gt->tile, NULL, - XE_OA_BUFFER_SIZE, ttm_bo_type_kernel, + size, ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT); if (IS_ERR(bo)) return PTR_ERR(bo); @@ -1087,6 +1093,13 @@ static u32 oag_report_ctx_switches(const struct xe_oa_stream *stream) 0 : OAG_OA_DEBUG_DISABLE_CTX_SWITCH_REPORTS); } +static u32 oag_buf_size_select(const struct xe_oa_stream *stream) +{ + return _MASKED_FIELD(OAG_OA_DEBUG_BUF_SIZE_SELECT, + stream->oa_buffer.bo->size > SZ_16M ? + OAG_OA_DEBUG_BUF_SIZE_SELECT : 0); +} + static int xe_oa_enable_metric_set(struct xe_oa_stream *stream) { struct xe_mmio *mmio = &stream->gt->mmio; @@ -1119,6 +1132,7 @@ static int xe_oa_enable_metric_set(struct xe_oa_stream *stream) xe_mmio_write32(mmio, __oa_regs(stream)->oa_debug, _MASKED_BIT_ENABLE(oa_debug) | oag_report_ctx_switches(stream) | + oag_buf_size_select(stream) | oag_configure_mmio_trigger(stream, true)); xe_mmio_write32(mmio, __oa_regs(stream)->oa_ctx_ctrl, stream->periodic ? @@ -1260,6 +1274,17 @@ static int xe_oa_set_prop_syncs_user(struct xe_oa *oa, u64 value, return 0; } +static int xe_oa_set_prop_oa_buffer_size(struct xe_oa *oa, u64 value, + struct xe_oa_open_param *param) +{ + if (!is_power_of_2(value) || value < SZ_128K || value > SZ_128M) { + drm_dbg(&oa->xe->drm, "OA buffer size invalid %llu\n", value); + return -EINVAL; + } + param->oa_buffer_size = value; + return 0; +} + static int xe_oa_set_prop_ret_inval(struct xe_oa *oa, u64 value, struct xe_oa_open_param *param) { @@ -1280,6 +1305,7 @@ static const xe_oa_set_property_fn xe_oa_set_property_funcs_open[] = { [DRM_XE_OA_PROPERTY_NO_PREEMPT] = xe_oa_set_no_preempt, [DRM_XE_OA_PROPERTY_NUM_SYNCS] = xe_oa_set_prop_num_syncs, [DRM_XE_OA_PROPERTY_SYNCS] = xe_oa_set_prop_syncs_user, + [DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE] = xe_oa_set_prop_oa_buffer_size, }; static const xe_oa_set_property_fn xe_oa_set_property_funcs_config[] = { @@ -1294,6 +1320,7 @@ static const xe_oa_set_property_fn xe_oa_set_property_funcs_config[] = { [DRM_XE_OA_PROPERTY_NO_PREEMPT] = xe_oa_set_prop_ret_inval, [DRM_XE_OA_PROPERTY_NUM_SYNCS] = xe_oa_set_prop_num_syncs, [DRM_XE_OA_PROPERTY_SYNCS] = xe_oa_set_prop_syncs_user, + [DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE] = xe_oa_set_prop_ret_inval, }; static int xe_oa_user_ext_set_property(struct xe_oa *oa, enum xe_oa_user_extn_from from, @@ -1553,7 +1580,7 @@ static long xe_oa_status_locked(struct xe_oa_stream *stream, unsigned long arg) static long xe_oa_info_locked(struct xe_oa_stream *stream, unsigned long arg) { - struct drm_xe_oa_stream_info info = { .oa_buf_size = XE_OA_BUFFER_SIZE, }; + struct drm_xe_oa_stream_info info = { .oa_buf_size = stream->oa_buffer.bo->size, }; void __user *uaddr = (void __user *)arg; if (copy_to_user(uaddr, &info, sizeof(info))) @@ -1639,7 +1666,7 @@ static int xe_oa_mmap(struct file *file, struct vm_area_struct *vma) } /* Can mmap the entire OA buffer or nothing (no partial OA buffer mmaps) */ - if (vma->vm_end - vma->vm_start != XE_OA_BUFFER_SIZE) { + if (vma->vm_end - vma->vm_start != stream->oa_buffer.bo->size) { drm_dbg(&stream->oa->xe->drm, "Wrong mmap size, must be OA buffer size\n"); return -EINVAL; } @@ -1783,9 +1810,10 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream, if (GRAPHICS_VER(stream->oa->xe) >= 20 && stream->hwe->oa_unit->type == DRM_XE_OA_UNIT_TYPE_OAG && stream->sample) stream->oa_buffer.circ_size = - XE_OA_BUFFER_SIZE - XE_OA_BUFFER_SIZE % stream->oa_buffer.format->size; + param->oa_buffer_size - + param->oa_buffer_size % stream->oa_buffer.format->size; else - stream->oa_buffer.circ_size = XE_OA_BUFFER_SIZE; + stream->oa_buffer.circ_size = param->oa_buffer_size; if (stream->exec_q && engine_supports_mi_query(stream->hwe)) { /* If we don't find the context offset, just return error */ @@ -1828,7 +1856,7 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream, goto err_fw_put; } - ret = xe_oa_alloc_oa_buffer(stream); + ret = xe_oa_alloc_oa_buffer(stream, param->oa_buffer_size); if (ret) goto err_fw_put; @@ -2125,6 +2153,9 @@ int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *f drm_dbg(&oa->xe->drm, "Using periodic sampling freq %lld Hz\n", oa_freq_hz); } + if (!param.oa_buffer_size) + param.oa_buffer_size = DEFAULT_XE_OA_BUFFER_SIZE; + ret = xe_oa_parse_syncs(oa, ¶m); if (ret) goto err_exec_q; diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h index fea9d981e414..df7793915628 100644 --- a/drivers/gpu/drm/xe/xe_oa_types.h +++ b/drivers/gpu/drm/xe/xe_oa_types.h @@ -15,7 +15,7 @@ #include "regs/xe_reg_defs.h" #include "xe_hw_engine_types.h" -#define XE_OA_BUFFER_SIZE SZ_16M +#define DEFAULT_XE_OA_BUFFER_SIZE SZ_16M enum xe_oa_report_header { HDR_32_BIT = 0, diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c index 40f7c844ed44..80699dbeb2e9 100644 --- a/drivers/gpu/drm/xe/xe_pm.c +++ b/drivers/gpu/drm/xe/xe_pm.c @@ -738,9 +738,6 @@ void xe_pm_d3cold_allowed_toggle(struct xe_device *xe) xe->d3cold.allowed = false; mutex_unlock(&xe->d3cold.lock); - - drm_dbg(&xe->drm, - "d3cold: allowed=%s\n", str_yes_no(xe->d3cold.allowed)); } /** diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c index 797576690356..65c3c1688710 100644 --- a/drivers/gpu/drm/xe/xe_pt.c +++ b/drivers/gpu/drm/xe/xe_pt.c @@ -136,6 +136,7 @@ err_kfree: xe_pt_free(pt); return ERR_PTR(err); } +ALLOW_ERROR_INJECTION(xe_pt_create, ERRNO); /** * xe_pt_populate_empty() - Populate a page-table bo with scratch- or zero @@ -1850,6 +1851,7 @@ int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops) return 0; } +ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO); static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile, struct xe_vm_pgtable_update_ops *pt_update_ops, @@ -2130,6 +2132,7 @@ kill_vm_tile1: return ERR_PTR(err); } +ALLOW_ERROR_INJECTION(xe_pt_update_ops_run, ERRNO); /** * xe_pt_update_ops_fini() - Finish PT update operations diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index 170ae72d1a7b..d2a816f71bf2 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -23,6 +23,7 @@ #include "xe_guc_hwconfig.h" #include "xe_macros.h" #include "xe_mmio.h" +#include "xe_oa.h" #include "xe_ttm_vram_mgr.h" #include "xe_wa.h" @@ -670,7 +671,8 @@ static int query_oa_units(struct xe_device *xe, du->oa_unit_id = u->oa_unit_id; du->oa_unit_type = u->type; du->oa_timestamp_freq = xe_oa_timestamp_frequency(gt); - du->capabilities = DRM_XE_OA_CAPS_BASE | DRM_XE_OA_CAPS_SYNCS; + du->capabilities = DRM_XE_OA_CAPS_BASE | DRM_XE_OA_CAPS_SYNCS | + DRM_XE_OA_CAPS_OA_BUFFER_SIZE; j = 0; for_each_hw_engine(hwe, gt, hwe_id) { diff --git a/drivers/gpu/drm/xe/xe_reg_sr.c b/drivers/gpu/drm/xe/xe_reg_sr.c index c13123008e90..9475e3f74958 100644 --- a/drivers/gpu/drm/xe/xe_reg_sr.c +++ b/drivers/gpu/drm/xe/xe_reg_sr.c @@ -24,7 +24,6 @@ #include "xe_hw_engine_types.h" #include "xe_macros.h" #include "xe_mmio.h" -#include "xe_reg_whitelist.h" #include "xe_rtp_types.h" static void reg_sr_fini(struct drm_device *drm, void *arg) @@ -192,58 +191,6 @@ err_force_wake: xe_gt_err(gt, "Failed to apply, err=-ETIMEDOUT\n"); } -void xe_reg_sr_apply_whitelist(struct xe_hw_engine *hwe) -{ - struct xe_reg_sr *sr = &hwe->reg_whitelist; - struct xe_gt *gt = hwe->gt; - struct xe_device *xe = gt_to_xe(gt); - struct xe_reg_sr_entry *entry; - struct drm_printer p; - u32 mmio_base = hwe->mmio_base; - unsigned long reg; - unsigned int slot = 0; - unsigned int fw_ref; - - if (xa_empty(&sr->xa)) - return; - - drm_dbg(&xe->drm, "Whitelisting %s registers\n", sr->name); - - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) - goto err_force_wake; - - p = drm_dbg_printer(&xe->drm, DRM_UT_DRIVER, NULL); - xa_for_each(&sr->xa, reg, entry) { - if (slot == RING_MAX_NONPRIV_SLOTS) { - xe_gt_err(gt, - "hwe %s: maximum register whitelist slots (%d) reached, refusing to add more\n", - hwe->name, RING_MAX_NONPRIV_SLOTS); - break; - } - - xe_reg_whitelist_print_entry(&p, 0, reg, entry); - xe_mmio_write32(>->mmio, RING_FORCE_TO_NONPRIV(mmio_base, slot), - reg | entry->set_bits); - slot++; - } - - /* And clear the rest just in case of garbage */ - for (; slot < RING_MAX_NONPRIV_SLOTS; slot++) { - u32 addr = RING_NOPID(mmio_base).addr; - - xe_mmio_write32(>->mmio, RING_FORCE_TO_NONPRIV(mmio_base, slot), addr); - } - - xe_force_wake_put(gt_to_fw(gt), fw_ref); - - return; - -err_force_wake: - xe_force_wake_put(gt_to_fw(gt), fw_ref); - drm_err(&xe->drm, "Failed to apply, err=-ETIMEDOUT\n"); -} - /** * xe_reg_sr_dump - print all save/restore entries * @sr: Save/restore entries diff --git a/drivers/gpu/drm/xe/xe_reg_whitelist.c b/drivers/gpu/drm/xe/xe_reg_whitelist.c index 3996934974fa..edab5d4e3ba5 100644 --- a/drivers/gpu/drm/xe/xe_reg_whitelist.c +++ b/drivers/gpu/drm/xe/xe_reg_whitelist.c @@ -10,7 +10,9 @@ #include "regs/xe_oa_regs.h" #include "regs/xe_regs.h" #include "xe_gt_types.h" +#include "xe_gt_printk.h" #include "xe_platform_types.h" +#include "xe_reg_sr.h" #include "xe_rtp.h" #include "xe_step.h" @@ -89,6 +91,40 @@ static const struct xe_rtp_entry_sr register_whitelist[] = { {} }; +static void whitelist_apply_to_hwe(struct xe_hw_engine *hwe) +{ + struct xe_reg_sr *sr = &hwe->reg_whitelist; + struct xe_reg_sr_entry *entry; + struct drm_printer p; + unsigned long reg; + unsigned int slot; + + xe_gt_dbg(hwe->gt, "Add %s whitelist to engine\n", sr->name); + p = xe_gt_dbg_printer(hwe->gt); + + slot = 0; + xa_for_each(&sr->xa, reg, entry) { + struct xe_reg_sr_entry hwe_entry = { + .reg = RING_FORCE_TO_NONPRIV(hwe->mmio_base, slot), + .set_bits = entry->reg.addr | entry->set_bits, + .clr_bits = ~0u, + .read_mask = entry->read_mask, + }; + + if (slot == RING_MAX_NONPRIV_SLOTS) { + xe_gt_err(hwe->gt, + "hwe %s: maximum register whitelist slots (%d) reached, refusing to add more\n", + hwe->name, RING_MAX_NONPRIV_SLOTS); + break; + } + + xe_reg_whitelist_print_entry(&p, 0, reg, entry); + xe_reg_sr_add(&hwe->reg_sr, &hwe_entry, hwe->gt); + + slot++; + } +} + /** * xe_reg_whitelist_process_engine - process table of registers to whitelist * @hwe: engine instance to process whitelist for @@ -102,6 +138,7 @@ void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe) struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(hwe); xe_rtp_process_to_sr(&ctx, register_whitelist, &hwe->reg_whitelist); + whitelist_apply_to_hwe(hwe); } /** diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c index ef10782af656..04e2f539ccd9 100644 --- a/drivers/gpu/drm/xe/xe_sriov.c +++ b/drivers/gpu/drm/xe/xe_sriov.c @@ -14,6 +14,7 @@ #include "xe_mmio.h" #include "xe_sriov.h" #include "xe_sriov_pf.h" +#include "xe_sriov_vf.h" /** * xe_sriov_mode_to_string - Convert enum value to string. @@ -114,6 +115,9 @@ int xe_sriov_init(struct xe_device *xe) return err; } + if (IS_SRIOV_VF(xe)) + xe_sriov_vf_init_early(xe); + xe_assert(xe, !xe->sriov.wq); xe->sriov.wq = alloc_workqueue("xe-sriov-wq", 0, 0); if (!xe->sriov.wq) diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_helpers.h b/drivers/gpu/drm/xe/xe_sriov_pf_helpers.h index 7d156ba82479..dd1df950b021 100644 --- a/drivers/gpu/drm/xe/xe_sriov_pf_helpers.h +++ b/drivers/gpu/drm/xe/xe_sriov_pf_helpers.h @@ -20,7 +20,7 @@ * is within a range of supported VF numbers (up to maximum number of VFs that * driver can support, including VF0 that represents the PF itself). * - * Note: Effective only on debug builds. See `Xe ASSERTs`_ for more information. + * Note: Effective only on debug builds. See `Xe Asserts`_ for more information. */ #define xe_sriov_pf_assert_vfid(xe, vfid) \ xe_assert((xe), (vfid) <= xe_sriov_pf_get_totalvfs(xe)) diff --git a/drivers/gpu/drm/xe/xe_sriov_types.h b/drivers/gpu/drm/xe/xe_sriov_types.h index c7b7ad4af5c8..ca94382a721e 100644 --- a/drivers/gpu/drm/xe/xe_sriov_types.h +++ b/drivers/gpu/drm/xe/xe_sriov_types.h @@ -9,6 +9,7 @@ #include <linux/build_bug.h> #include <linux/mutex.h> #include <linux/types.h> +#include <linux/workqueue_types.h> /** * VFID - Virtual Function Identifier @@ -56,4 +57,20 @@ struct xe_device_pf { struct mutex master_lock; }; +/** + * struct xe_device_vf - Xe Virtual Function related data + * + * The data in this structure is valid only if driver is running in the + * @XE_SRIOV_MODE_VF mode. + */ +struct xe_device_vf { + /** @migration: VF Migration state data */ + struct { + /** @migration.worker: VF migration recovery worker */ + struct work_struct worker; + /** @migration.gt_flags: Per-GT request flags for VF migration recovery */ + unsigned long gt_flags; + } migration; +}; + #endif diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c new file mode 100644 index 000000000000..c1275e64aa9c --- /dev/null +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c @@ -0,0 +1,263 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#include <drm/drm_managed.h> + +#include "xe_assert.h" +#include "xe_device.h" +#include "xe_gt_sriov_printk.h" +#include "xe_gt_sriov_vf.h" +#include "xe_pm.h" +#include "xe_sriov.h" +#include "xe_sriov_printk.h" +#include "xe_sriov_vf.h" + +/** + * DOC: VF restore procedure in PF KMD and VF KMD + * + * Restoring previously saved state of a VF is one of core features of + * SR-IOV. All major VM Management applications allow saving and restoring + * the VM state, and doing that to a VM which uses SRIOV VF as one of + * the accessible devices requires support from KMD on both PF and VF side. + * VMM initiates all required operations through VFIO module, which then + * translates them into PF KMD calls. This description will focus on these + * calls, leaving out the module which initiates these steps (VFIO). + * + * In order to start the restore procedure, GuC needs to keep the VF in + * proper state. The PF driver can ensure GuC set it to VF_READY state + * by provisioning the VF, which in turn can be done after Function Level + * Reset of said VF (or after it was freshly created - in that case FLR + * is not needed). The FLR procedure ends with GuC sending message + * `GUC_PF_NOTIFY_VF_FLR_DONE`, and then provisioning data is sent to GuC. + * After the provisioning is completed, the VF needs to be paused, and + * at that point the actual restore can begin. + * + * During VF Restore, state of several resources is restored. These may + * include local memory content (system memory is restored by VMM itself), + * values of MMIO registers, stateless compression metadata and others. + * The final resource which also needs restoring is state of the VF + * submission maintained within GuC. For that, `GUC_PF_OPCODE_VF_RESTORE` + * message is used, with reference to the state blob to be consumed by + * GuC. + * + * Next, when VFIO is asked to set the VM into running state, the PF driver + * sends `GUC_PF_TRIGGER_VF_RESUME` to GuC. When sent after restore, this + * changes VF state within GuC to `VF_RESFIX_BLOCKED` rather than the + * usual `VF_RUNNING`. At this point GuC triggers an interrupt to inform + * the VF KMD within the VM that it was migrated. + * + * As soon as Virtual GPU of the VM starts, the VF driver within receives + * the MIGRATED interrupt and schedules post-migration recovery worker. + * That worker queries GuC for new provisioning (using MMIO communication), + * and applies fixups to any non-virtualized resources used by the VF. + * + * When the VF driver is ready to continue operation on the newly connected + * hardware, it sends `VF2GUC_NOTIFY_RESFIX_DONE` which causes it to + * enter the long awaited `VF_RUNNING` state, and therefore start handling + * CTB messages and scheduling workloads from the VF:: + * + * PF GuC VF + * [ ] | | + * [ ] PF2GUC_VF_CONTROL(pause) | | + * [ ]---------------------------> [ ] | + * [ ] [ ] GuC sets new VF state to | + * [ ] [ ]------- VF_READY_PAUSED | + * [ ] [ ] | | + * [ ] [ ] <----- | + * [ ] success [ ] | + * [ ] <---------------------------[ ] | + * [ ] | | + * [ ] PF loads resources from the | | + * [ ]------- saved image supplied | | + * [ ] | | | + * [ ] <----- | | + * [ ] | | + * [ ] GUC_PF_OPCODE_VF_RESTORE | | + * [ ]---------------------------> [ ] | + * [ ] [ ] GuC loads contexts and CTB | + * [ ] [ ]------- state from image | + * [ ] [ ] | | + * [ ] [ ] <----- | + * [ ] [ ] | + * [ ] [ ] GuC sets new VF state to | + * [ ] [ ]------- VF_RESFIX_PAUSED | + * [ ] [ ] | | + * [ ] success [ ] <----- | + * [ ] <---------------------------[ ] | + * [ ] | | + * [ ] GUC_PF_TRIGGER_VF_RESUME | | + * [ ]---------------------------> [ ] | + * [ ] [ ] GuC sets new VF state to | + * [ ] [ ]------- VF_RESFIX_BLOCKED | + * [ ] [ ] | | + * [ ] [ ] <----- | + * [ ] [ ] | + * [ ] [ ] GUC_INTR_SW_INT_0 | + * [ ] success [ ]---------------------------> [ ] + * [ ] <---------------------------[ ] [ ] + * | | VF2GUC_QUERY_SINGLE_KLV [ ] + * | [ ] <---------------------------[ ] + * | [ ] [ ] + * | [ ] new VF provisioning [ ] + * | [ ]---------------------------> [ ] + * | | [ ] + * | | VF driver applies post [ ] + * | | migration fixups -------[ ] + * | | | [ ] + * | | -----> [ ] + * | | [ ] + * | | VF2GUC_NOTIFY_RESFIX_DONE [ ] + * | [ ] <---------------------------[ ] + * | [ ] [ ] + * | [ ] GuC sets new VF state to [ ] + * | [ ]------- VF_RUNNING [ ] + * | [ ] | [ ] + * | [ ] <----- [ ] + * | [ ] success [ ] + * | [ ]---------------------------> [ ] + * | | | + * | | | + */ + +static void migration_worker_func(struct work_struct *w); + +/** + * xe_sriov_vf_init_early - Initialize SR-IOV VF specific data. + * @xe: the &xe_device to initialize + */ +void xe_sriov_vf_init_early(struct xe_device *xe) +{ + INIT_WORK(&xe->sriov.vf.migration.worker, migration_worker_func); +} + +/** + * vf_post_migration_requery_guc - Re-query GuC for current VF provisioning. + * @xe: the &xe_device struct instance + * + * After migration, we need to re-query all VF configuration to make sure + * they match previous provisioning. Note that most of VF provisioning + * shall be the same, except GGTT range, since GGTT is not virtualized per-VF. + * + * Returns: 0 if the operation completed successfully, or a negative error + * code otherwise. + */ +static int vf_post_migration_requery_guc(struct xe_device *xe) +{ + struct xe_gt *gt; + unsigned int id; + int err, ret = 0; + + for_each_gt(gt, xe, id) { + err = xe_gt_sriov_vf_query_config(gt); + ret = ret ?: err; + } + + return ret; +} + +/* + * vf_post_migration_imminent - Check if post-restore recovery is coming. + * @xe: the &xe_device struct instance + * + * Return: True if migration recovery worker will soon be running. Any worker currently + * executing does not affect the result. + */ +static bool vf_post_migration_imminent(struct xe_device *xe) +{ + return xe->sriov.vf.migration.gt_flags != 0 || + work_pending(&xe->sriov.vf.migration.worker); +} + +/* + * Notify all GuCs about resource fixups apply finished. + */ +static void vf_post_migration_notify_resfix_done(struct xe_device *xe) +{ + struct xe_gt *gt; + unsigned int id; + + for_each_gt(gt, xe, id) { + if (vf_post_migration_imminent(xe)) + goto skip; + xe_gt_sriov_vf_notify_resfix_done(gt); + } + return; + +skip: + drm_dbg(&xe->drm, "another recovery imminent, skipping notifications\n"); +} + +static void vf_post_migration_recovery(struct xe_device *xe) +{ + int err; + + drm_dbg(&xe->drm, "migration recovery in progress\n"); + xe_pm_runtime_get(xe); + err = vf_post_migration_requery_guc(xe); + if (vf_post_migration_imminent(xe)) + goto defer; + if (unlikely(err)) + goto fail; + + /* FIXME: add the recovery steps */ + vf_post_migration_notify_resfix_done(xe); + xe_pm_runtime_put(xe); + drm_notice(&xe->drm, "migration recovery ended\n"); + return; +defer: + xe_pm_runtime_put(xe); + drm_dbg(&xe->drm, "migration recovery deferred\n"); + return; +fail: + xe_pm_runtime_put(xe); + drm_err(&xe->drm, "migration recovery failed (%pe)\n", ERR_PTR(err)); + xe_device_declare_wedged(xe); +} + +static void migration_worker_func(struct work_struct *w) +{ + struct xe_device *xe = container_of(w, struct xe_device, + sriov.vf.migration.worker); + + vf_post_migration_recovery(xe); +} + +static bool vf_ready_to_recovery_on_all_gts(struct xe_device *xe) +{ + struct xe_gt *gt; + unsigned int id; + + for_each_gt(gt, xe, id) { + if (!test_bit(id, &xe->sriov.vf.migration.gt_flags)) { + xe_gt_sriov_dbg_verbose(gt, "still not ready to recover\n"); + return false; + } + } + return true; +} + +/** + * xe_sriov_vf_start_migration_recovery - Start VF migration recovery. + * @xe: the &xe_device to start recovery on + * + * This function shall be called only by VF. + */ +void xe_sriov_vf_start_migration_recovery(struct xe_device *xe) +{ + bool started; + + xe_assert(xe, IS_SRIOV_VF(xe)); + + if (!vf_ready_to_recovery_on_all_gts(xe)) + return; + + WRITE_ONCE(xe->sriov.vf.migration.gt_flags, 0); + /* Ensure other threads see that no flags are set now. */ + smp_mb(); + + started = queue_work(xe->sriov.wq, &xe->sriov.vf.migration.worker); + drm_info(&xe->drm, "VF migration recovery %s\n", started ? + "scheduled" : "already in progress"); +} diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.h b/drivers/gpu/drm/xe/xe_sriov_vf.h new file mode 100644 index 000000000000..7b8622cff2b7 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_sriov_vf.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2023-2024 Intel Corporation + */ + +#ifndef _XE_SRIOV_VF_H_ +#define _XE_SRIOV_VF_H_ + +struct xe_device; + +void xe_sriov_vf_init_early(struct xe_device *xe); +void xe_sriov_vf_start_migration_recovery(struct xe_device *xe); + +#endif diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h index 91130ad8999c..d5281de04d54 100644 --- a/drivers/gpu/drm/xe/xe_trace.h +++ b/drivers/gpu/drm/xe/xe_trace.h @@ -211,6 +211,7 @@ DECLARE_EVENT_CLASS(xe_sched_job, __string(dev, __dev_name_eq(job->q)) __field(u32, seqno) __field(u32, lrc_seqno) + __field(u8, gt_id) __field(u16, guc_id) __field(u32, guc_state) __field(u32, flags) @@ -223,6 +224,7 @@ DECLARE_EVENT_CLASS(xe_sched_job, __assign_str(dev); __entry->seqno = xe_sched_job_seqno(job); __entry->lrc_seqno = xe_sched_job_lrc_seqno(job); + __entry->gt_id = job->q->gt->info.id; __entry->guc_id = job->q->guc->id; __entry->guc_state = atomic_read(&job->q->guc->state); @@ -232,9 +234,9 @@ DECLARE_EVENT_CLASS(xe_sched_job, __entry->batch_addr = (u64)job->ptrs[0].batch_addr; ), - TP_printk("dev=%s, fence=%p, seqno=%u, lrc_seqno=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d", + TP_printk("dev=%s, fence=%p, seqno=%u, lrc_seqno=%u, gt=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d", __get_str(dev), __entry->fence, __entry->seqno, - __entry->lrc_seqno, __entry->guc_id, + __entry->lrc_seqno, __entry->gt_id, __entry->guc_id, __entry->batch_addr, __entry->guc_state, __entry->flags, __entry->error) ); @@ -282,6 +284,7 @@ DECLARE_EVENT_CLASS(xe_sched_msg, __string(dev, __dev_name_eq(((struct xe_exec_queue *)msg->private_data))) __field(u32, opcode) __field(u16, guc_id) + __field(u8, gt_id) ), TP_fast_assign( @@ -289,9 +292,11 @@ DECLARE_EVENT_CLASS(xe_sched_msg, __entry->opcode = msg->opcode; __entry->guc_id = ((struct xe_exec_queue *)msg->private_data)->guc->id; + __entry->gt_id = + ((struct xe_exec_queue *)msg->private_data)->gt->info.id; ), - TP_printk("dev=%s, guc_id=%d, opcode=%u", __get_str(dev), __entry->guc_id, + TP_printk("dev=%s, gt=%u guc_id=%d, opcode=%u", __get_str(dev), __entry->gt_id, __entry->guc_id, __entry->opcode) ); diff --git a/drivers/gpu/drm/xe/xe_trace_bo.h b/drivers/gpu/drm/xe/xe_trace_bo.h index 30a3cfbaaa09..1762dd30ba6d 100644 --- a/drivers/gpu/drm/xe/xe_trace_bo.h +++ b/drivers/gpu/drm/xe/xe_trace_bo.h @@ -48,6 +48,11 @@ DEFINE_EVENT(xe_bo, xe_bo_cpu_fault, TP_ARGS(bo) ); +DEFINE_EVENT(xe_bo, xe_bo_validate, + TP_PROTO(struct xe_bo *bo), + TP_ARGS(bo) +); + TRACE_EVENT(xe_bo_move, TP_PROTO(struct xe_bo *bo, uint32_t new_placement, uint32_t old_placement, bool move_lacks_source), diff --git a/drivers/gpu/drm/xe/xe_trace_lrc.c b/drivers/gpu/drm/xe/xe_trace_lrc.c new file mode 100644 index 000000000000..ab9b7e2970bc --- /dev/null +++ b/drivers/gpu/drm/xe/xe_trace_lrc.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef __CHECKER__ +#define CREATE_TRACE_POINTS +#include "xe_trace_lrc.h" +#endif diff --git a/drivers/gpu/drm/xe/xe_trace_lrc.h b/drivers/gpu/drm/xe/xe_trace_lrc.h new file mode 100644 index 000000000000..5c669a0b2180 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_trace_lrc.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright © 2024 Intel Corporation + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM xe + +#if !defined(_XE_TRACE_LRC_H_) || defined(TRACE_HEADER_MULTI_READ) +#define _XE_TRACE_LRC_H_ + +#include <linux/tracepoint.h> +#include <linux/types.h> + +#include "xe_gt_types.h" +#include "xe_lrc.h" +#include "xe_lrc_types.h" + +#define __dev_name_lrc(lrc) dev_name(gt_to_xe((lrc)->fence_ctx.gt)->drm.dev) + +TRACE_EVENT(xe_lrc_update_timestamp, + TP_PROTO(struct xe_lrc *lrc, uint32_t old), + TP_ARGS(lrc, old), + TP_STRUCT__entry( + __field(struct xe_lrc *, lrc) + __field(u32, old) + __field(u32, new) + __string(name, lrc->fence_ctx.name) + __string(device_id, __dev_name_lrc(lrc)) + ), + + TP_fast_assign( + __entry->lrc = lrc; + __entry->old = old; + __entry->new = lrc->ctx_timestamp; + __assign_str(name); + __assign_str(device_id); + ), + TP_printk("lrc=:%p lrc->name=%s old=%u new=%u device_id:%s", + __entry->lrc, __get_str(name), + __entry->old, __entry->new, + __get_str(device_id)) +); + +#endif + +/* This part must be outside protection */ +#undef TRACE_INCLUDE_PATH +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_PATH ../../drivers/gpu/drm/xe +#define TRACE_INCLUDE_FILE xe_trace_lrc +#include <trace/define_trace.h> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c index 423b261ea743..c95728c45ea4 100644 --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c @@ -52,7 +52,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man); struct xe_ttm_vram_mgr_resource *vres; struct drm_buddy *mm = &mgr->mm; - u64 size, remaining_size, min_page_size; + u64 size, min_page_size; unsigned long lpfn; int err; @@ -98,17 +98,6 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, goto error_fini; } - if (WARN_ON(min_page_size > SZ_2G)) { /* FIXME: sg limit */ - err = -EINVAL; - goto error_fini; - } - - if (WARN_ON((size > SZ_2G && - (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS)))) { - err = -EINVAL; - goto error_fini; - } - if (WARN_ON(!IS_ALIGNED(size, min_page_size))) { err = -EINVAL; goto error_fini; @@ -116,12 +105,11 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, mutex_lock(&mgr->lock); if (lpfn <= mgr->visible_size >> PAGE_SHIFT && size > mgr->visible_avail) { - mutex_unlock(&mgr->lock); err = -ENOSPC; - goto error_fini; + goto error_unlock; } - if (place->fpfn + (size >> PAGE_SHIFT) != place->lpfn && + if (place->fpfn + (size >> PAGE_SHIFT) != lpfn && place->flags & TTM_PL_FLAG_CONTIGUOUS) { size = roundup_pow_of_two(size); min_page_size = size; @@ -129,25 +117,11 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, lpfn = max_t(unsigned long, place->fpfn + (size >> PAGE_SHIFT), lpfn); } - remaining_size = size; - do { - /* - * Limit maximum size to 2GiB due to SG table limitations. - * FIXME: Should maybe be handled as part of sg construction. - */ - u64 alloc_size = min_t(u64, remaining_size, SZ_2G); - - err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT, - (u64)lpfn << PAGE_SHIFT, - alloc_size, - min_page_size, - &vres->blocks, - vres->flags); - if (err) - goto error_free_blocks; - - remaining_size -= alloc_size; - } while (remaining_size); + err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT, + (u64)lpfn << PAGE_SHIFT, size, + min_page_size, &vres->blocks, vres->flags); + if (err) + goto error_unlock; if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks)) @@ -194,9 +168,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, *res = &vres->base; return 0; - -error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks, 0); +error_unlock: mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -393,7 +365,8 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe, xe_res_first(res, offset, length, &cursor); while (cursor.remaining) { num_entries++; - xe_res_next(&cursor, cursor.size); + /* Limit maximum size to 2GiB due to SG table limitations. */ + xe_res_next(&cursor, min_t(u64, cursor.size, SZ_2G)); } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); @@ -413,7 +386,7 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe, xe_res_first(res, offset, length, &cursor); for_each_sgtable_sg((*sgt), sg, i) { phys_addr_t phys = cursor.start + tile->mem.vram.io_start; - size_t size = cursor.size; + size_t size = min_t(u64, cursor.size, SZ_2G); dma_addr_t addr; addr = dma_map_resource(dev, phys, size, dir, @@ -426,7 +399,7 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe, sg_dma_address(sg) = addr; sg_dma_len(sg) = size; - xe_res_next(&cursor, cursor.size); + xe_res_next(&cursor, size); } return 0; diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index c99380271de6..b4a44e1ea167 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -10,7 +10,6 @@ #include <drm/drm_exec.h> #include <drm/drm_print.h> -#include <drm/ttm/ttm_execbuf_util.h> #include <drm/ttm/ttm_tt.h> #include <uapi/drm/xe_drm.h> #include <linux/ascii85.h> @@ -733,13 +732,14 @@ static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds) vops->pt_update_ops[i].ops = kmalloc_array(vops->pt_update_ops[i].num_ops, sizeof(*vops->pt_update_ops[i].ops), - GFP_KERNEL); + GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (!vops->pt_update_ops[i].ops) return array_of_binds ? -ENOBUFS : -ENOMEM; } return 0; } +ALLOW_ERROR_INJECTION(xe_vma_ops_alloc, ERRNO); static void xe_vma_ops_fini(struct xe_vma_ops *vops) { @@ -1352,6 +1352,7 @@ static int xe_vm_create_scratch(struct xe_device *xe, struct xe_tile *tile, return 0; } +ALLOW_ERROR_INJECTION(xe_vm_create_scratch, ERRNO); static void xe_vm_free_scratch(struct xe_vm *vm) { @@ -1978,6 +1979,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo, return ops; } +ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_create, ERRNO); static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op, u16 pat_index, unsigned int flags) @@ -2357,13 +2359,15 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma, bool validate) { struct xe_bo *bo = xe_vma_bo(vma); + struct xe_vm *vm = xe_vma_vm(vma); int err = 0; if (bo) { if (!bo->vm) err = drm_exec_lock_obj(exec, &bo->ttm.base); if (!err && validate) - err = xe_bo_validate(bo, xe_vma_vm(vma), true); + err = xe_bo_validate(bo, vm, + !xe_vm_in_preempt_fence_mode(vm)); } return err; @@ -2697,6 +2701,7 @@ unlock: drm_exec_fini(&exec); return err; } +ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO); #define SUPPORTED_FLAGS_STUB \ (DRM_XE_VM_BIND_FLAG_READONLY | \ @@ -2733,7 +2738,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe, *bind_ops = kvmalloc_array(args->num_binds, sizeof(struct drm_xe_vm_bind_op), - GFP_KERNEL | __GFP_ACCOUNT); + GFP_KERNEL | __GFP_ACCOUNT | + __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (!*bind_ops) return args->num_binds > 1 ? -ENOBUFS : -ENOMEM; @@ -2973,14 +2979,16 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) if (args->num_binds) { bos = kvcalloc(args->num_binds, sizeof(*bos), - GFP_KERNEL | __GFP_ACCOUNT); + GFP_KERNEL | __GFP_ACCOUNT | + __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (!bos) { err = -ENOMEM; goto release_vm_lock; } ops = kvcalloc(args->num_binds, sizeof(*ops), - GFP_KERNEL | __GFP_ACCOUNT); + GFP_KERNEL | __GFP_ACCOUNT | + __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (!ops) { err = -ENOMEM; goto release_vm_lock; @@ -3303,7 +3311,6 @@ void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap) for (int i = 0; i < snap->num_snaps; i++) { struct xe_bo *bo = snap->snap[i].bo; - struct iosys_map src; int err; if (IS_ERR(snap->snap[i].data)) @@ -3316,16 +3323,8 @@ void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap) } if (bo) { - xe_bo_lock(bo, false); - err = ttm_bo_vmap(&bo->ttm, &src); - if (!err) { - xe_map_memcpy_from(xe_bo_device(bo), - snap->snap[i].data, - &src, snap->snap[i].bo_ofs, - snap->snap[i].len); - ttm_bo_vunmap(&bo->ttm, &src); - } - xe_bo_unlock(bo); + err = xe_bo_read(bo, snap->snap[i].bo_ofs, + snap->snap[i].data, snap->snap[i].len); } else { void __user *userptr = (void __user *)(size_t)snap->snap[i].bo_ofs; diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index c864dba35e1d..23adb7442881 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -17,7 +17,6 @@ struct drm_printer; struct drm_file; struct ttm_buffer_object; -struct ttm_validate_buffer; struct xe_exec_queue; struct xe_file; diff --git a/drivers/gpu/drm/xe/xe_vm_doc.h b/drivers/gpu/drm/xe/xe_vm_doc.h index 4d33f310b653..078786958403 100644 --- a/drivers/gpu/drm/xe/xe_vm_doc.h +++ b/drivers/gpu/drm/xe/xe_vm_doc.h @@ -64,8 +64,8 @@ * update page level 2 PDE[1] to page level 3b phys address (GPU) * * bind BO2 0x1ff000-0x201000 - * update page level 3a PTE[511] to BO2 phys addres (GPU) - * update page level 3b PTE[0] to BO2 phys addres + 0x1000 (GPU) + * update page level 3a PTE[511] to BO2 phys address (GPU) + * update page level 3b PTE[0] to BO2 phys address + 0x1000 (GPU) * * GPU bypass * ~~~~~~~~~~ @@ -192,7 +192,7 @@ * * If a VM is in fault mode (TODO: link to fault mode), new bind operations that * create mappings are by default deferred to the page fault handler (first - * use). This behavior can be overriden by setting the flag + * use). This behavior can be overridden by setting the flag * DRM_XE_VM_BIND_FLAG_IMMEDIATE which indicates to creating the mapping * immediately. * @@ -209,7 +209,7 @@ * * Since this a core kernel managed memory the kernel can move this memory * whenever it wants. We register an invalidation MMU notifier to alert XE when - * a user poiter is about to move. The invalidation notifier needs to block + * a user pointer is about to move. The invalidation notifier needs to block * until all pending users (jobs or compute mode engines) of the userptr are * idle to ensure no faults. This done by waiting on all of VM's dma-resv slots. * @@ -252,7 +252,7 @@ * Rebind worker * ------------- * - * The rebind worker is very similar to an exec. It is resposible for rebinding + * The rebind worker is very similar to an exec. It is responsible for rebinding * evicted BOs or userptrs, waiting on those operations, installing new preempt * fences, and finally resuming executing of engines in the VM. * @@ -317,11 +317,11 @@ * are not allowed, only long running workloads and ULLS are enabled on a faulting * VM. * - * Defered VM binds + * Deferred VM binds * ---------------- * * By default, on a faulting VM binds just allocate the VMA and the actual - * updating of the page tables is defered to the page fault handler. This + * updating of the page tables is deferred to the page fault handler. This * behavior can be overridden by setting the flag DRM_XE_VM_BIND_FLAG_IMMEDIATE in * the VM bind which will then do the bind immediately. * @@ -500,18 +500,18 @@ * Slot waiting * ------------ * - * 1. The exection of all jobs from kernel ops shall wait on all slots + * 1. The execution of all jobs from kernel ops shall wait on all slots * (DMA_RESV_USAGE_PREEMPT_FENCE) of either an external BO or VM (depends on if * kernel op is operating on external or private BO) * - * 2. In non-compute mode, the exection of all jobs from rebinds in execs shall + * 2. In non-compute mode, the execution of all jobs from rebinds in execs shall * wait on the DMA_RESV_USAGE_KERNEL slot of either an external BO or VM * (depends on if the rebind is operatiing on an external or private BO) * - * 3. In non-compute mode, the exection of all jobs from execs shall wait on the + * 3. In non-compute mode, the execution of all jobs from execs shall wait on the * last rebind job * - * 4. In compute mode, the exection of all jobs from rebinds in the rebind + * 4. In compute mode, the execution of all jobs from rebinds in the rebind * worker shall wait on the DMA_RESV_USAGE_KERNEL slot of either an external BO * or VM (depends on if rebind is operating on external or private BO) * diff --git a/drivers/gpu/drm/xe/xe_vsec.c b/drivers/gpu/drm/xe/xe_vsec.c new file mode 100644 index 000000000000..b378848d3b7b --- /dev/null +++ b/drivers/gpu/drm/xe/xe_vsec.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright © 2024 Intel Corporation */ +#include <linux/bitfield.h> +#include <linux/bits.h> +#include <linux/cleanup.h> +#include <linux/errno.h> +#include <linux/intel_vsec.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/pci.h> +#include <linux/types.h> + +#include "xe_device.h" +#include "xe_device_types.h" +#include "xe_drv.h" +#include "xe_mmio.h" +#include "xe_platform_types.h" +#include "xe_pm.h" +#include "xe_vsec.h" + +#include "regs/xe_pmt.h" + +/* PMT GUID value for BMG devices. NOTE: this is NOT a PCI id */ +#define BMG_DEVICE_ID 0xE2F8 + +static struct intel_vsec_header bmg_telemetry = { + .length = 0x10, + .id = VSEC_ID_TELEMETRY, + .num_entries = 2, + .entry_size = 4, + .tbir = 0, + .offset = BMG_DISCOVERY_OFFSET, +}; + +static struct intel_vsec_header bmg_punit_crashlog = { + .length = 0x10, + .id = VSEC_ID_CRASHLOG, + .num_entries = 1, + .entry_size = 4, + .tbir = 0, + .offset = BMG_DISCOVERY_OFFSET + 0x60, +}; + +static struct intel_vsec_header bmg_oobmsm_crashlog = { + .length = 0x10, + .id = VSEC_ID_CRASHLOG, + .num_entries = 1, + .entry_size = 4, + .tbir = 0, + .offset = BMG_DISCOVERY_OFFSET + 0x78, +}; + +static struct intel_vsec_header *bmg_capabilities[] = { + &bmg_telemetry, + &bmg_punit_crashlog, + &bmg_oobmsm_crashlog, + NULL +}; + +enum xe_vsec { + XE_VSEC_UNKNOWN = 0, + XE_VSEC_BMG, +}; + +static struct intel_vsec_platform_info xe_vsec_info[] = { + [XE_VSEC_BMG] = { + .caps = VSEC_CAP_TELEMETRY | VSEC_CAP_CRASHLOG, + .headers = bmg_capabilities, + }, + { } +}; + +/* + * The GUID will have the following bits to decode: + * [0:3] - {Telemetry space iteration number (0,1,..)} + * [4:7] - Segment (SEGMENT_INDEPENDENT-0, Client-1, Server-2) + * [8:11] - SOC_SKU + * [12:27] – Device ID – changes for each down bin SKU’s + * [28:29] - Capability Type (Crashlog-0, Telemetry Aggregator-1, Watcher-2) + * [30:31] - Record-ID (0-PUNIT, 1-OOBMSM_0, 2-OOBMSM_1) + */ +#define GUID_TELEM_ITERATION GENMASK(3, 0) +#define GUID_SEGMENT GENMASK(7, 4) +#define GUID_SOC_SKU GENMASK(11, 8) +#define GUID_DEVICE_ID GENMASK(27, 12) +#define GUID_CAP_TYPE GENMASK(29, 28) +#define GUID_RECORD_ID GENMASK(31, 30) + +#define PUNIT_TELEMETRY_OFFSET 0x0200 +#define PUNIT_WATCHER_OFFSET 0x14A0 +#define OOBMSM_0_WATCHER_OFFSET 0x18D8 +#define OOBMSM_1_TELEMETRY_OFFSET 0x1000 + +enum record_id { + PUNIT, + OOBMSM_0, + OOBMSM_1, +}; + +enum capability { + CRASHLOG, + TELEMETRY, + WATCHER, +}; + +static int xe_guid_decode(u32 guid, int *index, u32 *offset) +{ + u32 record_id = FIELD_GET(GUID_RECORD_ID, guid); + u32 cap_type = FIELD_GET(GUID_CAP_TYPE, guid); + u32 device_id = FIELD_GET(GUID_DEVICE_ID, guid); + + if (device_id != BMG_DEVICE_ID) + return -ENODEV; + + if (cap_type > WATCHER) + return -EINVAL; + + *offset = 0; + + if (cap_type == CRASHLOG) { + *index = record_id == PUNIT ? 2 : 4; + return 0; + } + + switch (record_id) { + case PUNIT: + *index = 0; + if (cap_type == TELEMETRY) + *offset = PUNIT_TELEMETRY_OFFSET; + else + *offset = PUNIT_WATCHER_OFFSET; + break; + + case OOBMSM_0: + *index = 1; + if (cap_type == WATCHER) + *offset = OOBMSM_0_WATCHER_OFFSET; + break; + + case OOBMSM_1: + *index = 1; + if (cap_type == TELEMETRY) + *offset = OOBMSM_1_TELEMETRY_OFFSET; + break; + default: + return -EINVAL; + } + + return 0; +} + +static int xe_pmt_telem_read(struct pci_dev *pdev, u32 guid, u64 *data, loff_t user_offset, + u32 count) +{ + struct xe_device *xe = pdev_to_xe_device(pdev); + void __iomem *telem_addr = xe->mmio.regs + BMG_TELEMETRY_OFFSET; + u32 mem_region; + u32 offset; + int ret; + + ret = xe_guid_decode(guid, &mem_region, &offset); + if (ret) + return ret; + + telem_addr += offset + user_offset; + + guard(mutex)(&xe->pmt.lock); + + /* indicate that we are not at an appropriate power level */ + if (!xe_pm_runtime_get_if_active(xe)) + return -ENODATA; + + /* set SoC re-mapper index register based on GUID memory region */ + xe_mmio_rmw32(xe_root_tile_mmio(xe), SG_REMAP_INDEX1, SG_REMAP_BITS, + REG_FIELD_PREP(SG_REMAP_BITS, mem_region)); + + memcpy_fromio(data, telem_addr, count); + xe_pm_runtime_put(xe); + + return count; +} + +static struct pmt_callbacks xe_pmt_cb = { + .read_telem = xe_pmt_telem_read, +}; + +static const int vsec_platforms[] = { + [XE_BATTLEMAGE] = XE_VSEC_BMG, +}; + +static enum xe_vsec get_platform_info(struct xe_device *xe) +{ + if (xe->info.platform > XE_BATTLEMAGE) + return XE_VSEC_UNKNOWN; + + return vsec_platforms[xe->info.platform]; +} + +/** + * xe_vsec_init - Initialize resources and add intel_vsec auxiliary + * interface + * @xe: valid xe instance + */ +void xe_vsec_init(struct xe_device *xe) +{ + struct intel_vsec_platform_info *info; + struct device *dev = xe->drm.dev; + struct pci_dev *pdev = to_pci_dev(dev); + enum xe_vsec platform; + + platform = get_platform_info(xe); + if (platform == XE_VSEC_UNKNOWN) + return; + + info = &xe_vsec_info[platform]; + if (!info->headers) + return; + + switch (platform) { + case XE_VSEC_BMG: + info->priv_data = &xe_pmt_cb; + break; + default: + break; + } + + /* + * Register a VSEC. Cleanup is handled using device managed + * resources. + */ + intel_vsec_register(pdev, info); +} +MODULE_IMPORT_NS("INTEL_VSEC"); diff --git a/drivers/gpu/drm/xe/xe_vsec.h b/drivers/gpu/drm/xe/xe_vsec.h new file mode 100644 index 000000000000..5777c53faec2 --- /dev/null +++ b/drivers/gpu/drm/xe/xe_vsec.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright © 2024 Intel Corporation */ + +#ifndef _XE_VSEC_H_ +#define _XE_VSEC_H_ + +struct xe_device; + +void xe_vsec_init(struct xe_device *xe); + +#endif diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c index 02cf647f86d8..570fe0376402 100644 --- a/drivers/gpu/drm/xe/xe_wa.c +++ b/drivers/gpu/drm/xe/xe_wa.c @@ -607,6 +607,12 @@ static const struct xe_rtp_entry_sr engine_was[] = { FUNC(xe_rtp_match_first_render_or_compute)), XE_RTP_ACTIONS(SET(ROW_CHICKEN4, DISABLE_TDL_PUSH)) }, + { XE_RTP_NAME("16024792527"), + XE_RTP_RULES(GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0), + FUNC(xe_rtp_match_first_render_or_compute)), + XE_RTP_ACTIONS(FIELD_SET(SAMPLER_MODE, SMP_WAIT_FETCH_MERGING_COUNTER, + SMP_FORCE_128B_OVERFETCH)) + }, {} }; diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules index bcd04464b85e..3ed12a85cc60 100644 --- a/drivers/gpu/drm/xe/xe_wa_oob.rules +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules @@ -1,3 +1,4 @@ +1607983814 GRAPHICS_VERSION_RANGE(1200, 1210) 22012773006 GRAPHICS_VERSION_RANGE(1200, 1250) 14014475959 GRAPHICS_VERSION_RANGE(1270, 1271), GRAPHICS_STEP(A0, B0) PLATFORM(DG2) diff --git a/drivers/gpu/drm/xen/xen_drm_front.c b/drivers/gpu/drm/xen/xen_drm_front.c index aab79c5e34c2..1bda7ef606cc 100644 --- a/drivers/gpu/drm/xen/xen_drm_front.c +++ b/drivers/gpu/drm/xen/xen_drm_front.c @@ -478,7 +478,6 @@ static const struct drm_driver xen_drm_driver = { .fops = &xen_drm_dev_fops, .name = "xendrm-du", .desc = "Xen PV DRM Display Unit", - .date = "20180221", .major = 1, .minor = 0, @@ -525,11 +524,6 @@ static int xen_drm_drv_init(struct xen_drm_front_info *front_info) if (ret) goto fail_register; - DRM_INFO("Initialized %s %d.%d.%d %s on minor %d\n", - xen_drm_driver.name, xen_drm_driver.major, - xen_drm_driver.minor, xen_drm_driver.patchlevel, - xen_drm_driver.date, drm_dev->primary->index); - return 0; fail_register: diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c b/drivers/gpu/drm/xlnx/zynqmp_kms.c index fc81983d9e5e..b47463473472 100644 --- a/drivers/gpu/drm/xlnx/zynqmp_kms.c +++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c @@ -9,12 +9,12 @@ * - Laurent Pinchart <laurent.pinchart@ideasonboard.com> */ +#include <drm/clients/drm_client_setup.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_blend.h> #include <drm/drm_bridge.h> #include <drm/drm_bridge_connector.h> -#include <drm/drm_client_setup.h> #include <drm/drm_connector.h> #include <drm/drm_crtc.h> #include <drm/drm_device.h> @@ -409,7 +409,6 @@ static const struct drm_driver zynqmp_dpsub_drm_driver = { .name = "zynqmp-dpsub", .desc = "Xilinx DisplayPort Subsystem Driver", - .date = "20130509", .major = 1, .minor = 0, }; diff --git a/include/drm/drm_client_setup.h b/include/drm/clients/drm_client_setup.h index 46aab3fb46be..46aab3fb46be 100644 --- a/include/drm/drm_client_setup.h +++ b/include/drm/clients/drm_client_setup.h diff --git a/include/drm/display/drm_dp_helper.h b/include/drm/display/drm_dp_helper.h index 279624833ea9..8f4054a56039 100644 --- a/include/drm/display/drm_dp_helper.h +++ b/include/drm/display/drm_dp_helper.h @@ -567,6 +567,11 @@ int drm_dp_dpcd_read_phy_link_status(struct drm_dp_aux *aux, enum drm_dp_phy dp_phy, u8 link_status[DP_LINK_STATUS_SIZE]); +int drm_dp_dpcd_write_payload(struct drm_dp_aux *aux, + int vcpid, u8 start_time_slot, u8 time_slot_count); +int drm_dp_dpcd_clear_payload(struct drm_dp_aux *aux); +int drm_dp_dpcd_poll_act_handled(struct drm_dp_aux *aux, int timeout_ms); + bool drm_dp_send_real_edid_checksum(struct drm_dp_aux *aux, u8 real_edid_checksum); diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index 1bbbcb8e2d23..1b6e59139e6c 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -401,8 +401,6 @@ struct drm_driver { char *name; /** @desc: driver description */ char *desc; - /** @date: driver date, unused, to be removed */ - char *date; /** * @driver_features: diff --git a/include/drm/drm_utils.h b/include/drm/drm_utils.h index 70775748d243..15fa9b6865f4 100644 --- a/include/drm/drm_utils.h +++ b/include/drm/drm_utils.h @@ -12,8 +12,12 @@ #include <linux/types.h> +struct drm_edid; + int drm_get_panel_orientation_quirk(int width, int height); +int drm_get_panel_min_brightness_quirk(const struct drm_edid *edid); + signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec); #endif diff --git a/include/drm/intel/xe_pciids.h b/include/drm/intel/xe_pciids.h new file mode 100644 index 000000000000..16d4b8bb590a --- /dev/null +++ b/include/drm/intel/xe_pciids.h @@ -0,0 +1,235 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +#ifndef _XE_PCIIDS_H_ +#define _XE_PCIIDS_H_ + +/* + * Lists below can be turned into initializers for a struct pci_device_id + * by defining INTEL_VGA_DEVICE: + * + * #define INTEL_VGA_DEVICE(id, info) { \ + * 0x8086, id, \ + * ~0, ~0, \ + * 0x030000, 0xff0000, \ + * (unsigned long) info } + * + * And then calling like: + * + * XE_TGL_12_GT1_IDS(INTEL_VGA_DEVICE, ## __VA_ARGS__) + * + * To turn them into something else, just provide a different macro passed as + * first argument. + */ + +/* TGL */ +#define XE_TGL_GT1_IDS(MACRO__, ...) \ + MACRO__(0x9A60, ## __VA_ARGS__), \ + MACRO__(0x9A68, ## __VA_ARGS__), \ + MACRO__(0x9A70, ## __VA_ARGS__) + +#define XE_TGL_GT2_IDS(MACRO__, ...) \ + MACRO__(0x9A40, ## __VA_ARGS__), \ + MACRO__(0x9A49, ## __VA_ARGS__), \ + MACRO__(0x9A59, ## __VA_ARGS__), \ + MACRO__(0x9A78, ## __VA_ARGS__), \ + MACRO__(0x9AC0, ## __VA_ARGS__), \ + MACRO__(0x9AC9, ## __VA_ARGS__), \ + MACRO__(0x9AD9, ## __VA_ARGS__), \ + MACRO__(0x9AF8, ## __VA_ARGS__) + +#define XE_TGL_IDS(MACRO__, ...) \ + XE_TGL_GT1_IDS(MACRO__, ## __VA_ARGS__),\ + XE_TGL_GT2_IDS(MACRO__, ## __VA_ARGS__) + +/* RKL */ +#define XE_RKL_IDS(MACRO__, ...) \ + MACRO__(0x4C80, ## __VA_ARGS__), \ + MACRO__(0x4C8A, ## __VA_ARGS__), \ + MACRO__(0x4C8B, ## __VA_ARGS__), \ + MACRO__(0x4C8C, ## __VA_ARGS__), \ + MACRO__(0x4C90, ## __VA_ARGS__), \ + MACRO__(0x4C9A, ## __VA_ARGS__) + +/* DG1 */ +#define XE_DG1_IDS(MACRO__, ...) \ + MACRO__(0x4905, ## __VA_ARGS__), \ + MACRO__(0x4906, ## __VA_ARGS__), \ + MACRO__(0x4907, ## __VA_ARGS__), \ + MACRO__(0x4908, ## __VA_ARGS__), \ + MACRO__(0x4909, ## __VA_ARGS__) + +/* ADL-S */ +#define XE_ADLS_IDS(MACRO__, ...) \ + MACRO__(0x4680, ## __VA_ARGS__), \ + MACRO__(0x4682, ## __VA_ARGS__), \ + MACRO__(0x4688, ## __VA_ARGS__), \ + MACRO__(0x468A, ## __VA_ARGS__), \ + MACRO__(0x468B, ## __VA_ARGS__), \ + MACRO__(0x4690, ## __VA_ARGS__), \ + MACRO__(0x4692, ## __VA_ARGS__), \ + MACRO__(0x4693, ## __VA_ARGS__) + +/* ADL-P */ +#define XE_ADLP_IDS(MACRO__, ...) \ + MACRO__(0x46A0, ## __VA_ARGS__), \ + MACRO__(0x46A1, ## __VA_ARGS__), \ + MACRO__(0x46A2, ## __VA_ARGS__), \ + MACRO__(0x46A3, ## __VA_ARGS__), \ + MACRO__(0x46A6, ## __VA_ARGS__), \ + MACRO__(0x46A8, ## __VA_ARGS__), \ + MACRO__(0x46AA, ## __VA_ARGS__), \ + MACRO__(0x462A, ## __VA_ARGS__), \ + MACRO__(0x4626, ## __VA_ARGS__), \ + MACRO__(0x4628, ## __VA_ARGS__), \ + MACRO__(0x46B0, ## __VA_ARGS__), \ + MACRO__(0x46B1, ## __VA_ARGS__), \ + MACRO__(0x46B2, ## __VA_ARGS__), \ + MACRO__(0x46B3, ## __VA_ARGS__), \ + MACRO__(0x46C0, ## __VA_ARGS__), \ + MACRO__(0x46C1, ## __VA_ARGS__), \ + MACRO__(0x46C2, ## __VA_ARGS__), \ + MACRO__(0x46C3, ## __VA_ARGS__) + +/* ADL-N */ +#define XE_ADLN_IDS(MACRO__, ...) \ + MACRO__(0x46D0, ## __VA_ARGS__), \ + MACRO__(0x46D1, ## __VA_ARGS__), \ + MACRO__(0x46D2, ## __VA_ARGS__), \ + MACRO__(0x46D3, ## __VA_ARGS__), \ + MACRO__(0x46D4, ## __VA_ARGS__) + +/* RPL-S */ +#define XE_RPLS_IDS(MACRO__, ...) \ + MACRO__(0xA780, ## __VA_ARGS__), \ + MACRO__(0xA781, ## __VA_ARGS__), \ + MACRO__(0xA782, ## __VA_ARGS__), \ + MACRO__(0xA783, ## __VA_ARGS__), \ + MACRO__(0xA788, ## __VA_ARGS__), \ + MACRO__(0xA789, ## __VA_ARGS__), \ + MACRO__(0xA78A, ## __VA_ARGS__), \ + MACRO__(0xA78B, ## __VA_ARGS__) + +/* RPL-U */ +#define XE_RPLU_IDS(MACRO__, ...) \ + MACRO__(0xA721, ## __VA_ARGS__), \ + MACRO__(0xA7A1, ## __VA_ARGS__), \ + MACRO__(0xA7A9, ## __VA_ARGS__), \ + MACRO__(0xA7AC, ## __VA_ARGS__), \ + MACRO__(0xA7AD, ## __VA_ARGS__) + +/* RPL-P */ +#define XE_RPLP_IDS(MACRO__, ...) \ + MACRO__(0xA720, ## __VA_ARGS__), \ + MACRO__(0xA7A0, ## __VA_ARGS__), \ + MACRO__(0xA7A8, ## __VA_ARGS__), \ + MACRO__(0xA7AA, ## __VA_ARGS__), \ + MACRO__(0xA7AB, ## __VA_ARGS__) + +/* DG2 */ +#define XE_DG2_G10_IDS(MACRO__, ...) \ + MACRO__(0x5690, ## __VA_ARGS__), \ + MACRO__(0x5691, ## __VA_ARGS__), \ + MACRO__(0x5692, ## __VA_ARGS__), \ + MACRO__(0x56A0, ## __VA_ARGS__), \ + MACRO__(0x56A1, ## __VA_ARGS__), \ + MACRO__(0x56A2, ## __VA_ARGS__), \ + MACRO__(0x56BE, ## __VA_ARGS__), \ + MACRO__(0x56BF, ## __VA_ARGS__) + +#define XE_DG2_G11_IDS(MACRO__, ...) \ + MACRO__(0x5693, ## __VA_ARGS__), \ + MACRO__(0x5694, ## __VA_ARGS__), \ + MACRO__(0x5695, ## __VA_ARGS__), \ + MACRO__(0x56A5, ## __VA_ARGS__), \ + MACRO__(0x56A6, ## __VA_ARGS__), \ + MACRO__(0x56B0, ## __VA_ARGS__), \ + MACRO__(0x56B1, ## __VA_ARGS__), \ + MACRO__(0x56BA, ## __VA_ARGS__), \ + MACRO__(0x56BB, ## __VA_ARGS__), \ + MACRO__(0x56BC, ## __VA_ARGS__), \ + MACRO__(0x56BD, ## __VA_ARGS__) + +#define XE_DG2_G12_IDS(MACRO__, ...) \ + MACRO__(0x5696, ## __VA_ARGS__), \ + MACRO__(0x5697, ## __VA_ARGS__), \ + MACRO__(0x56A3, ## __VA_ARGS__), \ + MACRO__(0x56A4, ## __VA_ARGS__), \ + MACRO__(0x56B2, ## __VA_ARGS__), \ + MACRO__(0x56B3, ## __VA_ARGS__) + +#define XE_DG2_IDS(MACRO__, ...) \ + XE_DG2_G10_IDS(MACRO__, ## __VA_ARGS__),\ + XE_DG2_G11_IDS(MACRO__, ## __VA_ARGS__),\ + XE_DG2_G12_IDS(MACRO__, ## __VA_ARGS__) + +#define XE_ATS_M150_IDS(MACRO__, ...) \ + MACRO__(0x56C0, ## __VA_ARGS__), \ + MACRO__(0x56C2, ## __VA_ARGS__) + +#define XE_ATS_M75_IDS(MACRO__, ...) \ + MACRO__(0x56C1, ## __VA_ARGS__) + +#define XE_ATS_M_IDS(MACRO__, ...) \ + XE_ATS_M150_IDS(MACRO__, ## __VA_ARGS__),\ + XE_ATS_M75_IDS(MACRO__, ## __VA_ARGS__) + +/* ARL */ +#define XE_ARL_IDS(MACRO__, ...) \ + MACRO__(0x7D41, ## __VA_ARGS__), \ + MACRO__(0x7D51, ## __VA_ARGS__), \ + MACRO__(0x7D67, ## __VA_ARGS__), \ + MACRO__(0x7DD1, ## __VA_ARGS__), \ + MACRO__(0xB640, ## __VA_ARGS__) + +/* MTL */ +#define XE_MTL_IDS(MACRO__, ...) \ + MACRO__(0x7D40, ## __VA_ARGS__), \ + MACRO__(0x7D45, ## __VA_ARGS__), \ + MACRO__(0x7D55, ## __VA_ARGS__), \ + MACRO__(0x7D60, ## __VA_ARGS__), \ + MACRO__(0x7DD5, ## __VA_ARGS__) + +/* PVC */ +#define XE_PVC_IDS(MACRO__, ...) \ + MACRO__(0x0B69, ## __VA_ARGS__), \ + MACRO__(0x0B6E, ## __VA_ARGS__), \ + MACRO__(0x0BD4, ## __VA_ARGS__), \ + MACRO__(0x0BD5, ## __VA_ARGS__), \ + MACRO__(0x0BD6, ## __VA_ARGS__), \ + MACRO__(0x0BD7, ## __VA_ARGS__), \ + MACRO__(0x0BD8, ## __VA_ARGS__), \ + MACRO__(0x0BD9, ## __VA_ARGS__), \ + MACRO__(0x0BDA, ## __VA_ARGS__), \ + MACRO__(0x0BDB, ## __VA_ARGS__), \ + MACRO__(0x0BE0, ## __VA_ARGS__), \ + MACRO__(0x0BE1, ## __VA_ARGS__), \ + MACRO__(0x0BE5, ## __VA_ARGS__) + +#define XE_LNL_IDS(MACRO__, ...) \ + MACRO__(0x6420, ## __VA_ARGS__), \ + MACRO__(0x64A0, ## __VA_ARGS__), \ + MACRO__(0x64B0, ## __VA_ARGS__) + +#define XE_BMG_IDS(MACRO__, ...) \ + MACRO__(0xE202, ## __VA_ARGS__), \ + MACRO__(0xE20B, ## __VA_ARGS__), \ + MACRO__(0xE20C, ## __VA_ARGS__), \ + MACRO__(0xE20D, ## __VA_ARGS__), \ + MACRO__(0xE212, ## __VA_ARGS__) + +#define XE_PTL_IDS(MACRO__, ...) \ + MACRO__(0xB080, ## __VA_ARGS__), \ + MACRO__(0xB081, ## __VA_ARGS__), \ + MACRO__(0xB082, ## __VA_ARGS__), \ + MACRO__(0xB090, ## __VA_ARGS__), \ + MACRO__(0xB091, ## __VA_ARGS__), \ + MACRO__(0xB092, ## __VA_ARGS__), \ + MACRO__(0xB0A0, ## __VA_ARGS__), \ + MACRO__(0xB0A1, ## __VA_ARGS__), \ + MACRO__(0xB0A2, ## __VA_ARGS__), \ + MACRO__(0xB0B0, ## __VA_ARGS__) + +#endif diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 5804408815be..8ea11cd8df39 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo); int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man, struct ttm_operation_ctx *ctx); +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset, + void *buf, int len, int write); vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, struct vm_fault *vmf); vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, diff --git a/include/trace/events/amdxdna.h b/include/trace/events/amdxdna.h new file mode 100644 index 000000000000..c6cb2da7b706 --- /dev/null +++ b/include/trace/events/amdxdna.h @@ -0,0 +1,101 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM amdxdna + +#if !defined(_TRACE_AMDXDNA_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_AMDXDNA_H + +#include <drm/gpu_scheduler.h> +#include <linux/tracepoint.h> + +TRACE_EVENT(amdxdna_debug_point, + TP_PROTO(const char *name, u64 number, const char *str), + + TP_ARGS(name, number, str), + + TP_STRUCT__entry(__string(name, name) + __field(u64, number) + __string(str, str)), + + TP_fast_assign(__assign_str(name); + __entry->number = number; + __assign_str(str);), + + TP_printk("%s:%llu %s", __get_str(name), __entry->number, + __get_str(str)) +); + +TRACE_EVENT(xdna_job, + TP_PROTO(struct drm_sched_job *sched_job, const char *name, const char *str, u64 seq), + + TP_ARGS(sched_job, name, str, seq), + + TP_STRUCT__entry(__string(name, name) + __string(str, str) + __field(u64, fence_context) + __field(u64, fence_seqno) + __field(u64, seq)), + + TP_fast_assign(__assign_str(name); + __assign_str(str); + __entry->fence_context = sched_job->s_fence->finished.context; + __entry->fence_seqno = sched_job->s_fence->finished.seqno; + __entry->seq = seq;), + + TP_printk("fence=(context:%llu, seqno:%lld), %s seq#:%lld %s", + __entry->fence_context, __entry->fence_seqno, + __get_str(name), __entry->seq, + __get_str(str)) +); + +DECLARE_EVENT_CLASS(xdna_mbox_msg, + TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 msg_id), + + TP_ARGS(name, chann_id, opcode, msg_id), + + TP_STRUCT__entry(__string(name, name) + __field(u32, chann_id) + __field(u32, opcode) + __field(u32, msg_id)), + + TP_fast_assign(__assign_str(name); + __entry->chann_id = chann_id; + __entry->opcode = opcode; + __entry->msg_id = msg_id;), + + TP_printk("%s.%d id 0x%x opcode 0x%x", __get_str(name), + __entry->chann_id, __entry->msg_id, __entry->opcode) +); + +DEFINE_EVENT(xdna_mbox_msg, mbox_set_tail, + TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 id), + TP_ARGS(name, chann_id, opcode, id) +); + +DEFINE_EVENT(xdna_mbox_msg, mbox_set_head, + TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 id), + TP_ARGS(name, chann_id, opcode, id) +); + +TRACE_EVENT(mbox_irq_handle, + TP_PROTO(char *name, int irq), + + TP_ARGS(name, irq), + + TP_STRUCT__entry(__string(name, name) + __field(int, irq)), + + TP_fast_assign(__assign_str(name); + __entry->irq = irq;), + + TP_printk("%s.%d", __get_str(name), __entry->irq) +); + +#endif /* !defined(_TRACE_AMDXDNA_H) || defined(TRACE_HEADER_MULTI_READ) */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h new file mode 100644 index 000000000000..af12af8bd699 --- /dev/null +++ b/include/uapi/drm/amdxdna_accel.h @@ -0,0 +1,436 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _UAPI_AMDXDNA_ACCEL_H_ +#define _UAPI_AMDXDNA_ACCEL_H_ + +#include <linux/stddef.h> +#include "drm.h" + +#if defined(__cplusplus) +extern "C" { +#endif + +#define AMDXDNA_INVALID_CMD_HANDLE (~0UL) +#define AMDXDNA_INVALID_ADDR (~0UL) +#define AMDXDNA_INVALID_CTX_HANDLE 0 +#define AMDXDNA_INVALID_BO_HANDLE 0 +#define AMDXDNA_INVALID_FENCE_HANDLE 0 + +enum amdxdna_device_type { + AMDXDNA_DEV_TYPE_UNKNOWN = -1, + AMDXDNA_DEV_TYPE_KMQ, +}; + +enum amdxdna_drm_ioctl_id { + DRM_AMDXDNA_CREATE_HWCTX, + DRM_AMDXDNA_DESTROY_HWCTX, + DRM_AMDXDNA_CONFIG_HWCTX, + DRM_AMDXDNA_CREATE_BO, + DRM_AMDXDNA_GET_BO_INFO, + DRM_AMDXDNA_SYNC_BO, + DRM_AMDXDNA_EXEC_CMD, + DRM_AMDXDNA_GET_INFO, +}; + +/** + * struct qos_info - QoS information for driver. + * @gops: Giga operations per second. + * @fps: Frames per second. + * @dma_bandwidth: DMA bandwidtha. + * @latency: Frame response latency. + * @frame_exec_time: Frame execution time. + * @priority: Request priority. + * + * User program can provide QoS hints to driver. + */ +struct amdxdna_qos_info { + __u32 gops; + __u32 fps; + __u32 dma_bandwidth; + __u32 latency; + __u32 frame_exec_time; + __u32 priority; +}; + +/** + * struct amdxdna_drm_create_hwctx - Create hardware context. + * @ext: MBZ. + * @ext_flags: MBZ. + * @qos_p: Address of QoS info. + * @umq_bo: BO handle for user mode queue(UMQ). + * @log_buf_bo: BO handle for log buffer. + * @max_opc: Maximum operations per cycle. + * @num_tiles: Number of AIE tiles. + * @mem_size: Size of AIE tile memory. + * @umq_doorbell: Returned offset of doorbell associated with UMQ. + * @handle: Returned hardware context handle. + * @syncobj_handle: Returned syncobj handle for command completion. + */ +struct amdxdna_drm_create_hwctx { + __u64 ext; + __u64 ext_flags; + __u64 qos_p; + __u32 umq_bo; + __u32 log_buf_bo; + __u32 max_opc; + __u32 num_tiles; + __u32 mem_size; + __u32 umq_doorbell; + __u32 handle; + __u32 syncobj_handle; +}; + +/** + * struct amdxdna_drm_destroy_hwctx - Destroy hardware context. + * @handle: Hardware context handle. + * @pad: Structure padding. + */ +struct amdxdna_drm_destroy_hwctx { + __u32 handle; + __u32 pad; +}; + +/** + * struct amdxdna_cu_config - configuration for one CU + * @cu_bo: CU configuration buffer bo handle. + * @cu_func: Function of a CU. + * @pad: Structure padding. + */ +struct amdxdna_cu_config { + __u32 cu_bo; + __u8 cu_func; + __u8 pad[3]; +}; + +/** + * struct amdxdna_hwctx_param_config_cu - configuration for CUs in hardware context + * @num_cus: Number of CUs to configure. + * @pad: Structure padding. + * @cu_configs: Array of CU configurations of struct amdxdna_cu_config. + */ +struct amdxdna_hwctx_param_config_cu { + __u16 num_cus; + __u16 pad[3]; + struct amdxdna_cu_config cu_configs[] __counted_by(num_cus); +}; + +enum amdxdna_drm_config_hwctx_param { + DRM_AMDXDNA_HWCTX_CONFIG_CU, + DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF, + DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF, + DRM_AMDXDNA_HWCTX_CONFIG_NUM +}; + +/** + * struct amdxdna_drm_config_hwctx - Configure hardware context. + * @handle: hardware context handle. + * @param_type: Value in enum amdxdna_drm_config_hwctx_param. Specifies the + * structure passed in via param_val. + * @param_val: A structure specified by the param_type struct member. + * @param_val_size: Size of the parameter buffer pointed to by the param_val. + * If param_val is not a pointer, driver can ignore this. + * @pad: Structure padding. + * + * Note: if the param_val is a pointer pointing to a buffer, the maximum size + * of the buffer is 4KiB(PAGE_SIZE). + */ +struct amdxdna_drm_config_hwctx { + __u32 handle; + __u32 param_type; + __u64 param_val; + __u32 param_val_size; + __u32 pad; +}; + +enum amdxdna_bo_type { + AMDXDNA_BO_INVALID = 0, + AMDXDNA_BO_SHMEM, + AMDXDNA_BO_DEV_HEAP, + AMDXDNA_BO_DEV, + AMDXDNA_BO_CMD, +}; + +/** + * struct amdxdna_drm_create_bo - Create a buffer object. + * @flags: Buffer flags. MBZ. + * @vaddr: User VA of buffer if applied. MBZ. + * @size: Size in bytes. + * @type: Buffer type. + * @handle: Returned DRM buffer object handle. + */ +struct amdxdna_drm_create_bo { + __u64 flags; + __u64 vaddr; + __u64 size; + __u32 type; + __u32 handle; +}; + +/** + * struct amdxdna_drm_get_bo_info - Get buffer object information. + * @ext: MBZ. + * @ext_flags: MBZ. + * @handle: DRM buffer object handle. + * @pad: Structure padding. + * @map_offset: Returned DRM fake offset for mmap(). + * @vaddr: Returned user VA of buffer. 0 in case user needs mmap(). + * @xdna_addr: Returned XDNA device virtual address. + */ +struct amdxdna_drm_get_bo_info { + __u64 ext; + __u64 ext_flags; + __u32 handle; + __u32 pad; + __u64 map_offset; + __u64 vaddr; + __u64 xdna_addr; +}; + +/** + * struct amdxdna_drm_sync_bo - Sync buffer object. + * @handle: Buffer object handle. + * @direction: Direction of sync, can be from device or to device. + * @offset: Offset in the buffer to sync. + * @size: Size in bytes. + */ +struct amdxdna_drm_sync_bo { + __u32 handle; +#define SYNC_DIRECT_TO_DEVICE 0U +#define SYNC_DIRECT_FROM_DEVICE 1U + __u32 direction; + __u64 offset; + __u64 size; +}; + +enum amdxdna_cmd_type { + AMDXDNA_CMD_SUBMIT_EXEC_BUF = 0, + AMDXDNA_CMD_SUBMIT_DEPENDENCY, + AMDXDNA_CMD_SUBMIT_SIGNAL, +}; + +/** + * struct amdxdna_drm_exec_cmd - Execute command. + * @ext: MBZ. + * @ext_flags: MBZ. + * @hwctx: Hardware context handle. + * @type: One of command type in enum amdxdna_cmd_type. + * @cmd_handles: Array of command handles or the command handle itself + * in case of just one. + * @args: Array of arguments for all command handles. + * @cmd_count: Number of command handles in the cmd_handles array. + * @arg_count: Number of arguments in the args array. + * @seq: Returned sequence number for this command. + */ +struct amdxdna_drm_exec_cmd { + __u64 ext; + __u64 ext_flags; + __u32 hwctx; + __u32 type; + __u64 cmd_handles; + __u64 args; + __u32 cmd_count; + __u32 arg_count; + __u64 seq; +}; + +/** + * struct amdxdna_drm_query_aie_status - Query the status of the AIE hardware + * @buffer: The user space buffer that will return the AIE status. + * @buffer_size: The size of the user space buffer. + * @cols_filled: A bitmap of AIE columns whose data has been returned in the buffer. + */ +struct amdxdna_drm_query_aie_status { + __u64 buffer; /* out */ + __u32 buffer_size; /* in */ + __u32 cols_filled; /* out */ +}; + +/** + * struct amdxdna_drm_query_aie_version - Query the version of the AIE hardware + * @major: The major version number. + * @minor: The minor version number. + */ +struct amdxdna_drm_query_aie_version { + __u32 major; /* out */ + __u32 minor; /* out */ +}; + +/** + * struct amdxdna_drm_query_aie_tile_metadata - Query the metadata of AIE tile (core, mem, shim) + * @row_count: The number of rows. + * @row_start: The starting row number. + * @dma_channel_count: The number of dma channels. + * @lock_count: The number of locks. + * @event_reg_count: The number of events. + * @pad: Structure padding. + */ +struct amdxdna_drm_query_aie_tile_metadata { + __u16 row_count; + __u16 row_start; + __u16 dma_channel_count; + __u16 lock_count; + __u16 event_reg_count; + __u16 pad[3]; +}; + +/** + * struct amdxdna_drm_query_aie_metadata - Query the metadata of the AIE hardware + * @col_size: The size of a column in bytes. + * @cols: The total number of columns. + * @rows: The total number of rows. + * @version: The version of the AIE hardware. + * @core: The metadata for all core tiles. + * @mem: The metadata for all mem tiles. + * @shim: The metadata for all shim tiles. + */ +struct amdxdna_drm_query_aie_metadata { + __u32 col_size; + __u16 cols; + __u16 rows; + struct amdxdna_drm_query_aie_version version; + struct amdxdna_drm_query_aie_tile_metadata core; + struct amdxdna_drm_query_aie_tile_metadata mem; + struct amdxdna_drm_query_aie_tile_metadata shim; +}; + +/** + * struct amdxdna_drm_query_clock - Metadata for a clock + * @name: The clock name. + * @freq_mhz: The clock frequency. + * @pad: Structure padding. + */ +struct amdxdna_drm_query_clock { + __u8 name[16]; + __u32 freq_mhz; + __u32 pad; +}; + +/** + * struct amdxdna_drm_query_clock_metadata - Query metadata for clocks + * @mp_npu_clock: The metadata for MP-NPU clock. + * @h_clock: The metadata for H clock. + */ +struct amdxdna_drm_query_clock_metadata { + struct amdxdna_drm_query_clock mp_npu_clock; + struct amdxdna_drm_query_clock h_clock; +}; + +enum amdxdna_sensor_type { + AMDXDNA_SENSOR_TYPE_POWER +}; + +/** + * struct amdxdna_drm_query_sensor - The data for single sensor. + * @label: The name for a sensor. + * @input: The current value of the sensor. + * @max: The maximum value possible for the sensor. + * @average: The average value of the sensor. + * @highest: The highest recorded sensor value for this driver load for the sensor. + * @status: The sensor status. + * @units: The sensor units. + * @unitm: Translates value member variables into the correct unit via (pow(10, unitm) * value). + * @type: The sensor type from enum amdxdna_sensor_type. + * @pad: Structure padding. + */ +struct amdxdna_drm_query_sensor { + __u8 label[64]; + __u32 input; + __u32 max; + __u32 average; + __u32 highest; + __u8 status[64]; + __u8 units[16]; + __s8 unitm; + __u8 type; + __u8 pad[6]; +}; + +/** + * struct amdxdna_drm_query_hwctx - The data for single context. + * @context_id: The ID for this context. + * @start_col: The starting column for the partition assigned to this context. + * @num_col: The number of columns in the partition assigned to this context. + * @pad: Structure padding. + * @pid: The Process ID of the process that created this context. + * @command_submissions: The number of commands submitted to this context. + * @command_completions: The number of commands completed by this context. + * @migrations: The number of times this context has been moved to a different partition. + * @preemptions: The number of times this context has been preempted by another context in the + * same partition. + * @errors: The errors for this context. + */ +struct amdxdna_drm_query_hwctx { + __u32 context_id; + __u32 start_col; + __u32 num_col; + __u32 pad; + __s64 pid; + __u64 command_submissions; + __u64 command_completions; + __u64 migrations; + __u64 preemptions; + __u64 errors; +}; + +enum amdxdna_drm_get_param { + DRM_AMDXDNA_QUERY_AIE_STATUS, + DRM_AMDXDNA_QUERY_AIE_METADATA, + DRM_AMDXDNA_QUERY_AIE_VERSION, + DRM_AMDXDNA_QUERY_CLOCK_METADATA, + DRM_AMDXDNA_QUERY_SENSORS, + DRM_AMDXDNA_QUERY_HW_CONTEXTS, + DRM_AMDXDNA_NUM_GET_PARAM, +}; + +/** + * struct amdxdna_drm_get_info - Get some information from the AIE hardware. + * @param: Value in enum amdxdna_drm_get_param. Specifies the structure passed in the buffer. + * @buffer_size: Size of the input buffer. Size needed/written by the kernel. + * @buffer: A structure specified by the param struct member. + */ +struct amdxdna_drm_get_info { + __u32 param; /* in */ + __u32 buffer_size; /* in/out */ + __u64 buffer; /* in/out */ +}; + +#define DRM_IOCTL_AMDXDNA_CREATE_HWCTX \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_HWCTX, \ + struct amdxdna_drm_create_hwctx) + +#define DRM_IOCTL_AMDXDNA_DESTROY_HWCTX \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_DESTROY_HWCTX, \ + struct amdxdna_drm_destroy_hwctx) + +#define DRM_IOCTL_AMDXDNA_CONFIG_HWCTX \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CONFIG_HWCTX, \ + struct amdxdna_drm_config_hwctx) + +#define DRM_IOCTL_AMDXDNA_CREATE_BO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_BO, \ + struct amdxdna_drm_create_bo) + +#define DRM_IOCTL_AMDXDNA_GET_BO_INFO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_GET_BO_INFO, \ + struct amdxdna_drm_get_bo_info) + +#define DRM_IOCTL_AMDXDNA_SYNC_BO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_SYNC_BO, \ + struct amdxdna_drm_sync_bo) + +#define DRM_IOCTL_AMDXDNA_EXEC_CMD \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_EXEC_CMD, \ + struct amdxdna_drm_exec_cmd) + +#define DRM_IOCTL_AMDXDNA_GET_INFO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_GET_INFO, \ + struct amdxdna_drm_get_info) + +#if defined(__cplusplus) +} /* extern c end */ +#endif + +#endif /* _UAPI_AMDXDNA_ACCEL_H_ */ diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h index 2376c73abca1..dbbc404d2b3d 100644 --- a/include/uapi/drm/v3d_drm.h +++ b/include/uapi/drm/v3d_drm.h @@ -43,6 +43,7 @@ extern "C" { #define DRM_V3D_PERFMON_GET_VALUES 0x0a #define DRM_V3D_SUBMIT_CPU 0x0b #define DRM_V3D_PERFMON_GET_COUNTER 0x0c +#define DRM_V3D_PERFMON_SET_GLOBAL 0x0d #define DRM_IOCTL_V3D_SUBMIT_CL DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl) #define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo) @@ -61,6 +62,8 @@ extern "C" { #define DRM_IOCTL_V3D_SUBMIT_CPU DRM_IOW(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu) #define DRM_IOCTL_V3D_PERFMON_GET_COUNTER DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_PERFMON_GET_COUNTER, \ struct drm_v3d_perfmon_get_counter) +#define DRM_IOCTL_V3D_PERFMON_SET_GLOBAL DRM_IOW(DRM_COMMAND_BASE + DRM_V3D_PERFMON_SET_GLOBAL, \ + struct drm_v3d_perfmon_set_global) #define DRM_V3D_SUBMIT_CL_FLUSH_CACHE 0x01 #define DRM_V3D_SUBMIT_EXTENSION 0x02 @@ -766,6 +769,21 @@ struct drm_v3d_perfmon_get_counter { __u8 reserved[7]; }; +#define DRM_V3D_PERFMON_CLEAR_GLOBAL 0x0001 + +/** + * struct drm_v3d_perfmon_set_global - ioctl to define a global performance + * monitor + * + * The global performance monitor will be used for all jobs. If a global + * performance monitor is defined, jobs with a self-defined performance + * monitor won't be allowed. + */ +struct drm_v3d_perfmon_set_global { + __u32 flags; + __u32 id; +}; + #if defined(__cplusplus) } #endif diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index 4a8a4a63e99c..0383b52cbd86 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -1486,6 +1486,7 @@ struct drm_xe_oa_unit { __u64 capabilities; #define DRM_XE_OA_CAPS_BASE (1 << 0) #define DRM_XE_OA_CAPS_SYNCS (1 << 1) +#define DRM_XE_OA_CAPS_OA_BUFFER_SIZE (1 << 2) /** @oa_timestamp_freq: OA timestamp freq */ __u64 oa_timestamp_freq; @@ -1651,6 +1652,14 @@ enum drm_xe_oa_property_id { * to the VM bind case. */ DRM_XE_OA_PROPERTY_SYNCS, + + /** + * @DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE: Size of OA buffer to be + * allocated by the driver in bytes. Supported sizes are powers of + * 2 from 128 KiB to 128 MiB. When not specified, a 16 MiB OA + * buffer is allocated by default. + */ + DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE, }; /** |
