summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/lima
AgeCommit message (Collapse)AuthorFilesLines
2024-09-06drm/sched: add optional errno to drm_sched_start()Christian König1-1/+1
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
2024-07-29Merge drm/drm-next into drm-misc-nextThomas Zimmermann1-1/+1
Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-25drm/scheduler: remove full_recover from drm_sched_startChristian König1-1/+1
This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
2024-07-05Merge tag 'drm-misc-next-2024-07-04' of ↵Daniel Vetter1-0/+1
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for $kernel-version: UAPI Changes: Cross-subsystem Changes: Core Changes: - dp/mst: Fix daisy-chaining at resume - dsc: Add helper to dump the DSC configuration - tests: Add tests for the new monochrome TV mode variant Driver Changes: - ast: Refactor the mode setting code - panfrost: Fix devfreq job reporting - stm: Add LDVS support, DSI PHY updates - panels: - New panel: AUO G104STN01, K&d kd101ne3-40ti, Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704-curvy-outstanding-lizard-bcea78@houat
2024-06-30drm/lima: Mark simple_ondemand governor as softdepDragan Simic1-0/+1
Lima DRM driver uses devfreq to perform DVFS, while using simple_ondemand devfreq governor by default. This causes driver initialization to fail on boot when simple_ondemand governor isn't built into the kernel statically, as a result of the missing module dependency and, consequently, the required governor module not being included in the initial ramdisk. Thus, let's mark simple_ondemand governor as a softdep for Lima, to have its kernel module included in the initial ramdisk. This is a rather longstanding issue that has forced distributions to build devfreq governors statically into their kernels, [1][2] or may have forced some users to introduce unnecessary workarounds. Having simple_ondemand marked as a softdep for Lima may not resolve this issue for all Linux distributions. In particular, it will remain unresolved for the distributions whose utilities for the initial ramdisk generation do not handle the available softdep information [3] properly yet. However, some Linux distributions already handle softdeps properly while generating their initial ramdisks, [4] and this is a prerequisite step in the right direction for the distributions that don't handle them properly yet. [1] https://gitlab.manjaro.org/manjaro-arm/packages/core/linux-pinephone/-/blob/6.7-megi/config?ref_type=heads#L5749 [2] https://gitlab.com/postmarketOS/pmaports/-/blob/7f64e287e7732c9eaa029653e73ca3d4ba1c8598/main/linux-postmarketos-allwinner/config-postmarketos-allwinner.aarch64#L4654 [3] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git/commit/?id=49d8e0b59052999de577ab732b719cfbeb89504d [4] https://github.com/archlinux/mkinitcpio/commit/97ac4d37aae084a050be512f6d8f4489054668ad Cc: Philip Muller <philm@manjaro.org> Cc: Oliver Smith <ollieparanoid@postmarketos.org> Cc: Daniel Smith <danct12@disroot.org> Cc: stable@vger.kernel.org Fixes: 1996970773a3 ("drm/lima: Add optional devfreq and cooling device support") Signed-off-by: Dragan Simic <dsimic@manjaro.org> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/fdaf2e41bb6a0c5118ff9cc21f4f62583208d885.1718655070.git.dsimic@manjaro.org
2024-05-29drm/lima: Fix dma_resv deadlock at drm object pin timeAdrián Larumbe1-1/+1
Commit a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") moved locking the DRM object's dma reservation to drm_gem_pin(), but Lima's pin callback kept calling drm_gem_shmem_pin, which also tries to lock the same dma_resv, leading to a double lock situation. As was already done for Panfrost in the previous commit, fix it by replacing drm_gem_shmem_pin() with its locked variant. Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Tested-by: Val Packett <val@packett.cool> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523113236.432585-3-adrian.larumbe@collabora.com
2024-05-23tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-1/+1
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-04-15drm/lima: fix void pointer to enum lima_gpu_id cast warningErico Nunes2-3/+23
Create a simple data struct to hold compatible data so that we don't have to do the casts to void pointer to hold data. Fixes the following warning: drivers/gpu/drm/lima/lima_drv.c:387:13: error: cast to smaller integer type 'enum lima_gpu_id' from 'const void *' Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240401224329.1228468-3-nunes.erico@gmail.com
2024-04-15drm/lima: fix shared irq handling on driver removeErico Nunes3-0/+11
lima uses a shared interrupt, so the interrupt handlers must be prepared to be called at any time. At driver removal time, the clocks are disabled early and the interrupts stay registered until the very end of the remove process due to the devm usage. This is potentially a bug as the interrupts access device registers which assumes clocks are enabled. A crash can be triggered by removing the driver in a kernel with CONFIG_DEBUG_SHIRQ enabled. This patch frees the interrupts at each lima device finishing callback so that the handlers are already unregistered by the time we fully disable clocks. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240401224329.1228468-2-nunes.erico@gmail.com
2024-04-15drm/lima: mask irqs in timeout path before hard resetErico Nunes1-0/+7
There is a race condition in which a rendering job might take just long enough to trigger the drm sched job timeout handler but also still complete before the hard reset is done by the timeout handler. This runs into race conditions not expected by the timeout handler. In some very specific cases it currently may result in a refcount imbalance on lima_pm_idle, with a stack dump such as: [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669628] Call trace: [10136.669634] lima_devfreq_record_idle+0xa0/0xb0 [10136.669646] lima_sched_pipe_task_done+0x5c/0xb0 [10136.669656] lima_gp_irq_handler+0xa8/0x120 [10136.669666] __handle_irq_event_percpu+0x48/0x160 [10136.669679] handle_irq_event+0x4c/0xc0 We can prevent that race condition entirely by masking the irqs at the beginning of the timeout handler, at which point we give up on waiting for that job entirely. The irqs will be enabled again at the next hard reset which is already done as a recovery by the timeout handler. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-4-nunes.erico@gmail.com
2024-04-15drm/lima: include pp bcast irq in timeout handler checkErico Nunes1-0/+2
In commit 53cb55b20208 ("drm/lima: handle spurious timeouts due to high irq latency") a check was added to detect an unexpectedly high interrupt latency timeout. With further investigation it was noted that on Mali-450 the pp bcast irq may also be a trigger of race conditions against the timeout handler, so add it to this check too. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-3-nunes.erico@gmail.com
2024-04-15drm/lima: add mask irq callback to gp and ppErico Nunes5-0/+42
This is needed because we want to reset those devices in device-agnostic code such as lima_sched. In particular, masking irqs will be useful before a hard reset to prevent race conditions. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-2-nunes.erico@gmail.com
2024-02-12drm/lima: standardize debug messages by ip nameErico Nunes5-30/+36
Some debug messages carried the ip name, or included "lima", or included both the ip name and then the numbered ip name again. Make the messages more consistent by always looking up and showing the ip name first. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-9-nunes.erico@gmail.com
2024-02-12drm/lima: increase default job timeout to 10sErico Nunes1-1/+1
The previous 500ms default timeout was fairly optimistic and could be hit by real world applications. Many distributions targeting devices with a Mali-4xx already bumped this timeout to a higher limit. We can be generous here with a high value as 10s since this should mostly catch buggy jobs like infinite loop shaders, and these don't seem to happen very often in real applications. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-8-nunes.erico@gmail.com
2024-02-12drm/lima: remove guilty drm_sched context handlingErico Nunes4-7/+4
Marking the context as guilty currently only makes the application which hits a single timeout problem to stop its rendering context entirely. All jobs submitted later are dropped from the guilty context. Lima runs on fairly underpowered hardware for modern standards and it is not entirely unreasonable that a rendering job may time out occasionally due to high system load or too demanding application stack. In this case it would be generally preferred to report the error but try to keep the application going. Other similar embedded GPU drivers don't make use of the guilty context flag. Now that there are reliability improvements to the lima timeout recovery handling, drop the guilty contexts to let the application keep running in this case. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-7-nunes.erico@gmail.com
2024-02-12drm/lima: handle spurious timeouts due to high irq latencyErico Nunes1-3/+28
There are several unexplained and unreproduced cases of rendering timeouts with lima, for which one theory is high IRQ latency coming from somewhere else in the system. This kind of occurrence may cause applications to trigger unnecessary resets of the GPU or even applications to hang if it hits an issue in the recovery path. Panfrost already does some special handling to account for such "spurious timeouts", it makes sense to have this in lima too to reduce the chance that it hit users. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-6-nunes.erico@gmail.com
2024-02-12drm/lima: set gp bus_stop bit before hard resetErico Nunes1-0/+12
This is required for reliable hard resets. Otherwise, doing a hard reset while a task is still running (such as a task which is being stopped by the drm_sched timeout handler) may result in random mmu write timeouts or lockups which cause the entire gpu to hang. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-5-nunes.erico@gmail.com
2024-02-12drm/lima: set pp bus_stop bit before hard resetErico Nunes1-0/+13
This is required for reliable hard resets. Otherwise, doing a hard reset while a task is still running (such as a task which is being stopped by the drm_sched timeout handler) may result in random mmu write timeouts or lockups which cause the entire gpu to hang. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-4-nunes.erico@gmail.com
2024-02-12drm/lima: reset async_reset on gp hard resetErico Nunes1-0/+7
Lima gp jobs use an async reset to avoid having to wait for the soft reset right after a job. The soft reset is done at the end of a job and a reset_complete flag is expected to be set at the next job. However, in case the user runs into a job timeout from any application, a hard reset is issued to the hardware. This hard reset clears the reset_complete flag, which causes an error message to show up before the next job. This is probably harmless for the execution but can be very confusing to debug, as it blames a reset timeout on the next application to submit a job. Reset the async_reset flag when doing the hard reset so that we don't get that message. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-3-nunes.erico@gmail.com
2024-02-12drm/lima: reset async_reset on pp hard resetErico Nunes1-0/+7
Lima pp jobs use an async reset to avoid having to wait for the soft reset right after a job. The soft reset is done at the end of a job and a reset_complete flag is expected to be set at the next job. However, in case the user runs into a job timeout from any application, a hard reset is issued to the hardware. This hard reset clears the reset_complete flag, which causes an error message to show up before the next job. This is probably harmless for the execution but can be very confusing to debug, as it blames a reset timeout on the next application to submit a job. Reset the async_reset flag when doing the hard reset so that we don't get that message. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-2-nunes.erico@gmail.com
2024-01-29Merge drm/drm-next into drm-misc-nextMaxime Ripard1-0/+1
Kickstart 6.9 development cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-01-19drm/lima: fix a memleak in lima_heap_allocZhipeng Lu1-9/+14
When lima_vm_map_bo fails, the resources need to be deallocated, or there will be memleaks. Fixes: 6aebc51d7aef ("drm/lima: support heap buffer creation") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240117071328.3811480-1-alexious@zju.edu.cn
2024-01-12Merge tag 'drm-next-2024-01-10' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-3/+3
Pull drm updates from Dave Airlie: "This contains two major new drivers: - imagination is a first driver for Imagination Technologies devices, it only covers very specific devices, but there is hope to grow it - xe is a reboot of the i915 GPU (shares display) side using a more upstream focused development model, and trying to maximise code sharing. It's not enabled for any hw by default, and will hopefully get switched on for Intel's Lunarlake. This also drops a bunch of the old UMS ioctls. It's been dead long enough. amdgpu has a bunch of new color management code that is being used in the Steam Deck. amdgpu also has a new ACPI WBRF interaction to help avoid radio interference. Otherwise it's the usual lots of changes in lots of places. Detailed summary: new drivers: - imagination - new driver for Imagination Technologies GPU - xe - new driver for Intel GPUs using core drm concepts core: - add CLOSE_FB ioctl - remove old UMS ioctls - increase max objects to accomodate AMD color mgmt encoder: - create per-encoder debugfs directory edid: - split out drm_eld - SAD helpers - drop edid_firmware module parameter format-helper: - cache format conversion buffers sched: - move from kthread to workqueue - rename some internals - implement dynamic job-flow control gpuvm: - provide more features to handle GEM objects client: - don't acquire module reference displayport: - add mst path property documentation fdinfo: - alignment fix dma-buf: - add fence timestamp helper - add fence deadline support bridge: - transparent aux-bridge for DP/USB-C - lt8912b: add suspend/resume support and power regulator support panel: - edp: AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 - chromebook panel support - elida-kd35t133: rework pm - powkiddy RK2023 panel - himax-hx8394: drop prepare/unprepare and shutdown logic - BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G - Evervision VGG644804, SDC ATNA45AF01 - nv3052c: register docs, init sequence fixes, fascontek FS035VG158 - st7701: Anbernic RG-ARC support - r63353 panel controller - Ilitek ILI9805 panel controller - AUO G156HAN04.0 simplefb: - support memory regions - support power domains amdgpu: - add new 64-bit sequence number infrastructure - add AMD specific color management - ACPI WBRF support for RF interference handling - GPUVM updates - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9/7.11 updates - SubVP updates - XGMI PCIe state dumping for aqua vanjaram - GFX11 golden register updates - enable tunnelling on high pri compute amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend/resume - Fix possible memory leak in pqm_uninit() - support import/export of dma-bufs using GEM handles radeon: - fix possible overflows in command buffer checking - check for errors in ring_lock i915: - reorg display code for reuse in xe driver - fdinfo memory stats printing - DP MST bandwidth mgmt improvements - DP panel replay enabling - MTL C20 phy state verification - MTL DP DSC fractional bpp support - Audio fastset support - use dma_fence interfaces instead of i915_sw_fence - Separate gem and display code - AUX register macro refactoring - Separate display module/device parameters - Move display capabilities debugfs under display - Makefile cleanups - Register cleanups - Move display lock inits under display/ - VLV/CHV DPIO PHY register and interface refactoring - DSI VBT sequence refactoring - C10/C20 PHY PLL hardware readout - DPLL code cleanups - Cleanup PXP plane protection checks - Improve display debug msgs - PSR selective fetch fixes/improvements - DP MST fixes - Xe2LPD FBC restrictions removed - DGFX uses direct VBT pin mapping - more MTL WAs - fix MTL eDP bug - eliminate use of kmap_atomic habanalabs: - sysfs entry to identify a device minor id with debugfs path - sysfs entry to expose device module id - add signed device info retrieval through INFO ioctl - add Gaudi2C device support - pcie reset prepare/done hooks msm: - Add support for SDM670, SM8650 - Handle the CFG interconnect to fix the obscure hangs / timeouts - Kconfig fix for QMP dependency - use managed allocators - DPU: SDM670, SM8650 support - DPU: Enable SmartDMA on SM8350 and SM8450 - DP: enable runtime PM support - GPU: add metadata UAPI - GPU: move devcoredumps to GPU device - GPU: convert to drm_exec ivpu: - update FW API - new debugfs file - a new NOP job submission test mode - improve suspend/resume - PM improvements - MMU PT optimizations - firmware profile frequency support - support for uncached buffers - switch to gem shmem helpers - replace kthread with threaded irqs rockchip: - rk3066_hdmi: convert to atomic - vop2: support nv20 and nv30 - rk3588 support mediatek: - use devm_platform_ioremap_resource - stop using iommu_present - MT8188 VDOSYS1 display support panfrost: - PM improvements - improve interrupt handling as poweroff qaic: - allow to run with single MSI - support host/device time sync - switch to persistent DRM devices exynos: - fix potential error pointer dereference - fix wrong error checking - add missing call to drm_atomic_helper_shutdown omapdrm: - dma-fence lockdep annotation fix tidss: - dma-fence lockdep annotation fix - support for AM62A7 v3d: - BCM2712 - rpi5 support - fdinfo + gputop support - uapi for CPU job handling virtio-gpu: - add context debug name" * tag 'drm-next-2024-01-10' of git://anongit.freedesktop.org/drm/drm: (2340 commits) drm/amd/display: Allow z8/z10 from driver drm/amd/display: fix bandwidth validation failure on DCN 2.1 drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well drm/amd/display: Move fixpt_from_s3132 to amdgpu_dm drm/amd/display: Fix recent checkpatch errors in amdgpu_dm Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole" drm/amd/display: avoid stringop-overflow warnings for dp_decide_lane_settings() drm/amd/display: Fix power_helpers.c codestyle drm/amd/display: Fix hdcp_log.h codestyle drm/amd/display: Fix hdcp2_execution.c codestyle drm/amd/display: Fix hdcp_psp.h codestyle drm/amd/display: Fix freesync.c codestyle drm/amd/display: Fix hdcp_psp.c codestyle drm/amd/display: Fix hdcp1_execution.c codestyle drm/amd/pm/smu7: fix a memleak in smu7_hwmgr_backend_init drm/amdkfd: Fix iterator used outside loop in 'kfd_add_peer_prop()' drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()' drm/amdkfd: Confirm list is non-empty before utilizing list_first_entry in kfd_topology.c drm/amdgpu: Fix '*fw' from request_firmware() not released in 'amdgpu_ucode_request()' drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()' ...
2023-12-21sched.h: move pid helpers to pid.hKent Overstreet1-0/+1
This is needed for killing the sched.h dependency on rcupdate.h, and pid.h is a better place for this code anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-10drm/sched: implement dynamic job-flow controlDanilo Krummrich2-2/+2
Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
2023-11-02drm/sched: Convert drm scheduler to use a work queue rather than kthreadMatthew Brost1-1/+1
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In Xe the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in Xe we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In Xe submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue v6: - (Luben / checkpatch) fix alignment in msm_ringbuffer.c - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue - (Luben) Update comment for drm_sched_wqueue_enqueue - (Luben) Positive check for submit_wq in drm_sched_init - (Luben) s/alloc_submit_wq/own_submit_wq v7: - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue v8: - (Luben) Adjust var names / comments Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-10-26drm/sched: Convert the GPU scheduler to variable number of run-queuesLuben Tuikov1-1/+3
The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Emma Anholt <emma@anholt.net> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-07-21drm: Explicitly include correct DT includesRob Herring1-1/+2
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Acked-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230714174545.4056287-1-robh@kernel.org
2023-06-26drm: Clear fd/handle callbacks in struct drm_driverThomas Zimmermann1-2/+0
Clear all assignments of struct drm_driver's fd/handle callbacks to drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These functions are called by default. Add a TODO item to convert vmwgfx to the defaults as well. v2: * remove TODO item (Zack) * also update amdgpu's amdgpu_partition_driver Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simon Ser <contact@emersion.fr> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> # qaic Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-3-tzimmermann@suse.de
2023-06-21drm/shmem-helper: Switch to reservation lockDmitry Osipenko1-4/+4
Replace all drm-shmem locks with a GEM reservation lock. This makes locks consistent with dma-buf locking convention where importers are responsible for holding reservation lock for all operations performed over dma-bufs, preventing deadlock between dma-buf importers and exporters. Suggested-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230529223935.2672495-7-dmitry.osipenko@collabora.com
2023-06-19Merge drm/drm-next into drm-misc-nextThomas Zimmermann1-1/+1
Backmerging into drm-misc-next to get commit 2c1c7ba457d4 ("drm/amdgpu: support partition drm devices"), which is required to fix commit 0adec22702d4 ("drm: Remove struct drm_driver.gem_prime_mmap"). Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-06-19drm: Remove struct drm_driver.gem_prime_mmapThomas Zimmermann1-1/+0
All drivers initialize this field with drm_gem_prime_mmap(). Call the function directly and remove the field. Simplifies the code and resolves a long-standing TODO item. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230613150441.17720-3-tzimmermann@suse.de
2023-06-08drm/lima: Convert to platform remove callback returning voidUwe Kleine-König1-3/+2
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230507162616.1368908-26-u.kleine-koenig@pengutronix.de
2023-06-07drm/lima: fix sched context destroyErico Nunes1-1/+1
The drm sched entity must be flushed before finishing, to account for jobs potentially still in flight at that time. Lima did not do this flush until now, so switch the destroy call to the drm_sched_entity_destroy() wrapper which will take care of that. This fixes a regression on lima which started since the rework in commit 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") where some specific types of applications may hang indefinitely. Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230606143247.433018-1-nunes.erico@gmail.com
2023-04-05Revert "drm/lima: add usage counting method to ctx_mgr"Qiang Yu2-32/+1
This reverts commit bccafec957a5c4b22ac29e53a39e82d0a0008348. This is due to the depend commit has been reverted on upstream: commit baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"") Acked-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230404002601.24136-4-yq882255@163.com
2023-04-05Revert "drm/lima: allocate unique id per drm_file"Qiang Yu3-16/+0
This reverts commit 87767de835edf527b879a363d518c33da68adb81. This is due to the depend commit has been reverted on upstream: commit baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"") Acked-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230404002601.24136-3-yq882255@163.com
2023-04-05Revert "drm/lima: add show_fdinfo for drm usage stats"Qiang Yu1-30/+1
This reverts commit 4a66f3da99dcb4dcbd28544110636b50adfb0f0d. This is due to the depend commit has been reverted on upstream: commit baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"") Acked-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230404002601.24136-2-yq882255@163.com
2023-04-02drm/lima: add show_fdinfo for drm usage statsErico Nunes1-1/+30
This exposes an accumulated active time per client via the fdinfo infrastructure per execution engine, following Documentation/gpu/drm-usage-stats.rst. In lima, the exposed execution engines are gp and pp. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230312233052.21095-4-nunes.erico@gmail.com
2023-04-02drm/lima: allocate unique id per drm_fileErico Nunes3-0/+16
To track if fds are pointing to the same execution context and export the expected information to fdinfo, similar to what is done in other drivers. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230312233052.21095-3-nunes.erico@gmail.com
2023-04-02drm/lima: add usage counting method to ctx_mgrErico Nunes2-1/+32
lima maintains a context manager per drm_file, similar to amdgpu. In order to account for the complete usage per drm_file, all of the associated contexts need to be considered. Previously released contexts also need to be accounted for but their drm_sched_entity info is gone once they get released, so account for it in the ctx_mgr. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230312233052.21095-2-nunes.erico@gmail.com
2023-04-02drm/lima/lima_drv: Add missing unwind goto in lima_pdev_probe()Harshit Mogalapalli1-2/+4
Smatch reports: drivers/gpu/drm/lima/lima_drv.c:396 lima_pdev_probe() warn: missing unwind goto? Store return value in err and goto 'err_out0' which has lima_sched_slab_fini() before returning. Fixes: a1d2a6339961 ("drm/lima: driver for ARM Mali4xx GPUs") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230314052711.4061652-1-harshit.m.mogalapalli@oracle.com
2023-03-22drm/lima: Use drm_sched_job_add_syncobj_dependency()Maíra Canal1-10/+2
As lima_gem_add_deps() performs the same steps as drm_sched_job_add_syncobj_dependency(), replace the open-coded implementation in Lima in order to simply use the DRM function. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230224214133.411966-1-mcanal@igalia.com
2023-02-28Revert "drm/shmem-helper: Switch to reservation lock"Thomas Zimmermann1-4/+4
This reverts commit 67b7836d4458790f1261e31fe0ce3250989784f0. The locking appears incomplete. A caller of SHMEM helper's pin function never acquires the dma-buf reservation lock. So we get WARNING: CPU: 3 PID: 967 at drivers/gpu/drm/drm_gem_shmem_helper.c:243 drm_gem_shmem_pin+0x42/0x90 [drm_shmem_helper] Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230228152612.19971-1-tzimmermann@suse.de
2023-02-27drm/shmem-helper: Switch to reservation lockDmitry Osipenko1-4/+4
Replace all drm-shmem locks with a GEM reservation lock. This makes locks consistent with dma-buf locking convention where importers are responsible for holding reservation lock for all operations performed over dma-bufs, preventing deadlock between dma-buf importers and exporters. Suggested-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://lore.kernel.org/all/20230108210445.3948344-8-dmitry.osipenko@collabora.com/
2022-11-24Backmerge tag 'v6.1-rc6' into drm-nextDave Airlie1-6/+9
Linux 6.1-rc6 This is needed for drm-misc-next and tegra. Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-14drm/lima: Fix opp clkname setting in case of missing regulatorErico Nunes1-6/+9
Commit d8c32d3971e4 ("drm/lima: Migrate to dev_pm_opp_set_config()") introduced a regression as it may undo the clk_names setting in case the optional regulator is missing. This resulted in test and performance regressions with lima. Restore the old behavior where clk_names is set separately so it is not undone in case of a missing optional regulator. Fixes: d8c32d3971e4 ("drm/lima: Migrate to dev_pm_opp_set_config()") Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221027073200.3885839-1-nunes.erico@gmail.com
2022-10-18drm/gem: Take reservation lock for vmap/vunmap operationsDmitry Osipenko1-2/+2
The new common dma-buf locking convention will require buffer importers to hold the reservation lock around mapping operations. Make DRM GEM core to take the lock around the vmapping operations and update DRM drivers to use the locked functions for the case where DRM core now holds the lock. This patch prepares DRM core and drivers to the common dynamic dma-buf locking convention. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-4-dmitry.osipenko@collabora.com
2022-07-08drm/lima: Migrate to dev_pm_opp_set_config()Viresh Kumar1-5/+6
The OPP core now provides a unified API for setting all configuration types, i.e. dev_pm_opp_set_config(). Lets start using it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-07-08OPP: Make dev_pm_opp_set_regulators() accept NULL terminated listViresh Kumar1-1/+2
Make dev_pm_opp_set_regulators() accept a NULL terminated list of names instead of making the callers keep the two parameters in sync, which creates an opportunity for bugs to get in. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Steven Price <steven.price@arm.com> # panfrost Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-04-07dma-buf: specify usage while adding fences to dma_resv obj v7Christian König1-4/+3
Instead of distingting between shared and exclusive fences specify the fence usage while adding fences. Rework all drivers to use this interface instead and deprecate the old one. v2: some kerneldoc comments suggested by Daniel v3: fix a missing case in radeon v4: rebase on nouveau changes, fix lockdep and temporary disable warning v5: more documentation updates v6: separate internal dma_resv changes from this patch, avoids to disable warning temporary, rebase on upstream changes v7: fix missed case in lima driver, minimize changes to i915_gem_busy_ioctl Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-3-christian.koenig@amd.com