Age | Commit message (Collapse) | Author | Files | Lines |
|
The current implementation of drm_sched_start uses a hardcoded
-ECANCELED to dispose of a job when the parent/hw fence is NULL.
This results in drm_sched_job_done being called with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult
to distinguish between recovery methods, whether a queue reset or a
full GPU reset was used.
To improve this, we first try a soft recovery for timeout jobs and
use the error code -ENODATA. If soft recovery fails, we proceed with
a queue reset, where the error code remains -ENODATA for the job.
Finally, for a full GPU reset, we use error codes -ECANCELED or
-ETIME. This patch adds an error code parameter to drm_sched_start,
allowing us to differentiate between queue reset and GPU reset
failures. This enables user mode and test applications to validate
the expected correctness of the requested operation. After a
successful queue reset, the only way to continue normal operation is
to call drm_sched_job_done with the specific error code -ENODATA.
v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain
and amdgpu_device_unlock_reset_domain to allow user mode to track
the queue reset status and distinguish between queue reset and
GPU reset.
v2: Christian suggested using the error codes -ENODATA for queue reset
and -ECANCELED or -ETIME for GPU reset, returned to
amdgpu_cs_wait_ioctl.
v3: To meet the requirements, we introduce a new function
drm_sched_start_ex with an additional parameter to set
dma_fence_set_error, allowing us to handle the specific error
codes appropriately and dispose of bad jobs with the selected
error code depending on whether it was a queue reset or GPU reset.
v4: Alex suggested using a new name, drm_sched_start_with_recovery_error,
which more accurately describes the function's purpose.
Additionally, it was recommended to add documentation details
about the new method.
v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex)
v6 (chk): rebase on upstream changes, cleanup the commit message,
drop the new function again and update all callers,
apply the errno also to scheduler fences with hw fences
v7 (chk): rebased
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
|
|
Backmerging to get a late RC of v6.10 before moving into v6.11.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
This was basically just another one of amdgpus hacks. The parameter
allowed to restart the scheduler without turning fence signaling on
again.
That this is absolutely not a good idea should be obvious by now since
the fences will then just sit there and never signal.
While at it cleanup the code a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for $kernel-version:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dp/mst: Fix daisy-chaining at resume
- dsc: Add helper to dump the DSC configuration
- tests: Add tests for the new monochrome TV mode variant
Driver Changes:
- ast: Refactor the mode setting code
- panfrost: Fix devfreq job reporting
- stm: Add LDVS support, DSI PHY updates
- panels:
- New panel: AUO G104STN01, K&d kd101ne3-40ti,
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240704-curvy-outstanding-lizard-bcea78@houat
|
|
Lima DRM driver uses devfreq to perform DVFS, while using simple_ondemand
devfreq governor by default. This causes driver initialization to fail on
boot when simple_ondemand governor isn't built into the kernel statically,
as a result of the missing module dependency and, consequently, the
required governor module not being included in the initial ramdisk. Thus,
let's mark simple_ondemand governor as a softdep for Lima, to have its
kernel module included in the initial ramdisk.
This is a rather longstanding issue that has forced distributions to build
devfreq governors statically into their kernels, [1][2] or may have forced
some users to introduce unnecessary workarounds.
Having simple_ondemand marked as a softdep for Lima may not resolve this
issue for all Linux distributions. In particular, it will remain
unresolved for the distributions whose utilities for the initial ramdisk
generation do not handle the available softdep information [3] properly
yet. However, some Linux distributions already handle softdeps properly
while generating their initial ramdisks, [4] and this is a prerequisite
step in the right direction for the distributions that don't handle them
properly yet.
[1] https://gitlab.manjaro.org/manjaro-arm/packages/core/linux-pinephone/-/blob/6.7-megi/config?ref_type=heads#L5749
[2] https://gitlab.com/postmarketOS/pmaports/-/blob/7f64e287e7732c9eaa029653e73ca3d4ba1c8598/main/linux-postmarketos-allwinner/config-postmarketos-allwinner.aarch64#L4654
[3] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git/commit/?id=49d8e0b59052999de577ab732b719cfbeb89504d
[4] https://github.com/archlinux/mkinitcpio/commit/97ac4d37aae084a050be512f6d8f4489054668ad
Cc: Philip Muller <philm@manjaro.org>
Cc: Oliver Smith <ollieparanoid@postmarketos.org>
Cc: Daniel Smith <danct12@disroot.org>
Cc: stable@vger.kernel.org
Fixes: 1996970773a3 ("drm/lima: Add optional devfreq and cooling device support")
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fdaf2e41bb6a0c5118ff9cc21f4f62583208d885.1718655070.git.dsimic@manjaro.org
|
|
Commit a78027847226 ("drm/gem: Acquire reservation lock in
drm_gem_{pin/unpin}()") moved locking the DRM object's dma reservation to
drm_gem_pin(), but Lima's pin callback kept calling drm_gem_shmem_pin,
which also tries to lock the same dma_resv, leading to a double lock
situation.
As was already done for Panfrost in the previous commit, fix it by
replacing drm_gem_shmem_pin() with its locked variant.
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Val Packett <val@packett.cool>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523113236.432585-3-adrian.larumbe@collabora.com
|
|
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
|
|
Create a simple data struct to hold compatible data so that we don't
have to do the casts to void pointer to hold data.
Fixes the following warning:
drivers/gpu/drm/lima/lima_drv.c:387:13: error: cast to smaller integer
type 'enum lima_gpu_id' from 'const void *'
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240401224329.1228468-3-nunes.erico@gmail.com
|
|
lima uses a shared interrupt, so the interrupt handlers must be prepared
to be called at any time. At driver removal time, the clocks are
disabled early and the interrupts stay registered until the very end of
the remove process due to the devm usage.
This is potentially a bug as the interrupts access device registers
which assumes clocks are enabled. A crash can be triggered by removing
the driver in a kernel with CONFIG_DEBUG_SHIRQ enabled.
This patch frees the interrupts at each lima device finishing callback
so that the handlers are already unregistered by the time we fully
disable clocks.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240401224329.1228468-2-nunes.erico@gmail.com
|
|
There is a race condition in which a rendering job might take just long
enough to trigger the drm sched job timeout handler but also still
complete before the hard reset is done by the timeout handler.
This runs into race conditions not expected by the timeout handler.
In some very specific cases it currently may result in a refcount
imbalance on lima_pm_idle, with a stack dump such as:
[10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0
...
[10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0
...
[10136.669628] Call trace:
[10136.669634] lima_devfreq_record_idle+0xa0/0xb0
[10136.669646] lima_sched_pipe_task_done+0x5c/0xb0
[10136.669656] lima_gp_irq_handler+0xa8/0x120
[10136.669666] __handle_irq_event_percpu+0x48/0x160
[10136.669679] handle_irq_event+0x4c/0xc0
We can prevent that race condition entirely by masking the irqs at the
beginning of the timeout handler, at which point we give up on waiting
for that job entirely.
The irqs will be enabled again at the next hard reset which is already
done as a recovery by the timeout handler.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-4-nunes.erico@gmail.com
|
|
In commit 53cb55b20208 ("drm/lima: handle spurious timeouts due to high
irq latency") a check was added to detect an unexpectedly high interrupt
latency timeout.
With further investigation it was noted that on Mali-450 the pp bcast
irq may also be a trigger of race conditions against the timeout
handler, so add it to this check too.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-3-nunes.erico@gmail.com
|
|
This is needed because we want to reset those devices in device-agnostic
code such as lima_sched.
In particular, masking irqs will be useful before a hard reset to
prevent race conditions.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-2-nunes.erico@gmail.com
|
|
Some debug messages carried the ip name, or included "lima", or
included both the ip name and then the numbered ip name again.
Make the messages more consistent by always looking up and showing
the ip name first.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-9-nunes.erico@gmail.com
|
|
The previous 500ms default timeout was fairly optimistic and could be
hit by real world applications. Many distributions targeting devices
with a Mali-4xx already bumped this timeout to a higher limit.
We can be generous here with a high value as 10s since this should
mostly catch buggy jobs like infinite loop shaders, and these don't
seem to happen very often in real applications.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-8-nunes.erico@gmail.com
|
|
Marking the context as guilty currently only makes the application which
hits a single timeout problem to stop its rendering context entirely.
All jobs submitted later are dropped from the guilty context.
Lima runs on fairly underpowered hardware for modern standards and it is
not entirely unreasonable that a rendering job may time out occasionally
due to high system load or too demanding application stack. In this case
it would be generally preferred to report the error but try to keep the
application going.
Other similar embedded GPU drivers don't make use of the guilty context
flag. Now that there are reliability improvements to the lima timeout
recovery handling, drop the guilty contexts to let the application keep
running in this case.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-7-nunes.erico@gmail.com
|
|
There are several unexplained and unreproduced cases of rendering
timeouts with lima, for which one theory is high IRQ latency coming from
somewhere else in the system.
This kind of occurrence may cause applications to trigger unnecessary
resets of the GPU or even applications to hang if it hits an issue in
the recovery path.
Panfrost already does some special handling to account for such
"spurious timeouts", it makes sense to have this in lima too to reduce
the chance that it hit users.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-6-nunes.erico@gmail.com
|
|
This is required for reliable hard resets. Otherwise, doing a hard reset
while a task is still running (such as a task which is being stopped by
the drm_sched timeout handler) may result in random mmu write timeouts
or lockups which cause the entire gpu to hang.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-5-nunes.erico@gmail.com
|
|
This is required for reliable hard resets. Otherwise, doing a hard reset
while a task is still running (such as a task which is being stopped by
the drm_sched timeout handler) may result in random mmu write timeouts
or lockups which cause the entire gpu to hang.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-4-nunes.erico@gmail.com
|
|
Lima gp jobs use an async reset to avoid having to wait for the soft
reset right after a job. The soft reset is done at the end of a job and
a reset_complete flag is expected to be set at the next job.
However, in case the user runs into a job timeout from any application,
a hard reset is issued to the hardware. This hard reset clears the
reset_complete flag, which causes an error message to show up before the
next job.
This is probably harmless for the execution but can be very confusing to
debug, as it blames a reset timeout on the next application to submit a
job.
Reset the async_reset flag when doing the hard reset so that we don't
get that message.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-3-nunes.erico@gmail.com
|
|
Lima pp jobs use an async reset to avoid having to wait for the soft
reset right after a job. The soft reset is done at the end of a job and
a reset_complete flag is expected to be set at the next job.
However, in case the user runs into a job timeout from any application,
a hard reset is issued to the hardware. This hard reset clears the
reset_complete flag, which causes an error message to show up before the
next job.
This is probably harmless for the execution but can be very confusing to
debug, as it blames a reset timeout on the next application to submit a
job.
Reset the async_reset flag when doing the hard reset so that we don't
get that message.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-2-nunes.erico@gmail.com
|
|
Kickstart 6.9 development cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
When lima_vm_map_bo fails, the resources need to be deallocated, or
there will be memleaks.
Fixes: 6aebc51d7aef ("drm/lima: support heap buffer creation")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117071328.3811480-1-alexious@zju.edu.cn
|
|
Pull drm updates from Dave Airlie:
"This contains two major new drivers:
- imagination is a first driver for Imagination Technologies devices,
it only covers very specific devices, but there is hope to grow it
- xe is a reboot of the i915 GPU (shares display) side using a more
upstream focused development model, and trying to maximise code
sharing. It's not enabled for any hw by default, and will hopefully
get switched on for Intel's Lunarlake.
This also drops a bunch of the old UMS ioctls. It's been dead long
enough.
amdgpu has a bunch of new color management code that is being used in
the Steam Deck.
amdgpu also has a new ACPI WBRF interaction to help avoid radio
interference.
Otherwise it's the usual lots of changes in lots of places.
Detailed summary:
new drivers:
- imagination - new driver for Imagination Technologies GPU
- xe - new driver for Intel GPUs using core drm concepts
core:
- add CLOSE_FB ioctl
- remove old UMS ioctls
- increase max objects to accomodate AMD color mgmt
encoder:
- create per-encoder debugfs directory
edid:
- split out drm_eld
- SAD helpers
- drop edid_firmware module parameter
format-helper:
- cache format conversion buffers
sched:
- move from kthread to workqueue
- rename some internals
- implement dynamic job-flow control
gpuvm:
- provide more features to handle GEM objects
client:
- don't acquire module reference
displayport:
- add mst path property documentation
fdinfo:
- alignment fix
dma-buf:
- add fence timestamp helper
- add fence deadline support
bridge:
- transparent aux-bridge for DP/USB-C
- lt8912b: add suspend/resume support and power regulator support
panel:
- edp: AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49
- chromebook panel support
- elida-kd35t133: rework pm
- powkiddy RK2023 panel
- himax-hx8394: drop prepare/unprepare and shutdown logic
- BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G
- Evervision VGG644804, SDC ATNA45AF01
- nv3052c: register docs, init sequence fixes, fascontek FS035VG158
- st7701: Anbernic RG-ARC support
- r63353 panel controller
- Ilitek ILI9805 panel controller
- AUO G156HAN04.0
simplefb:
- support memory regions
- support power domains
amdgpu:
- add new 64-bit sequence number infrastructure
- add AMD specific color management
- ACPI WBRF support for RF interference handling
- GPUVM updates
- RAS updates
- DCN 3.5 updates
- Rework PCIe link speed handling
- Document GPU reset types
- DMUB fixes
- eDP fixes
- NBIO 7.9/7.11 updates
- SubVP updates
- XGMI PCIe state dumping for aqua vanjaram
- GFX11 golden register updates
- enable tunnelling on high pri compute
amdkfd:
- Migrate TLB flushing logic to amdgpu
- Trap handler fixes
- Fix restore workers handling on suspend/resume
- Fix possible memory leak in pqm_uninit()
- support import/export of dma-bufs using GEM handles
radeon:
- fix possible overflows in command buffer checking
- check for errors in ring_lock
i915:
- reorg display code for reuse in xe driver
- fdinfo memory stats printing
- DP MST bandwidth mgmt improvements
- DP panel replay enabling
- MTL C20 phy state verification
- MTL DP DSC fractional bpp support
- Audio fastset support
- use dma_fence interfaces instead of i915_sw_fence
- Separate gem and display code
- AUX register macro refactoring
- Separate display module/device parameters
- Move display capabilities debugfs under display
- Makefile cleanups
- Register cleanups
- Move display lock inits under display/
- VLV/CHV DPIO PHY register and interface refactoring
- DSI VBT sequence refactoring
- C10/C20 PHY PLL hardware readout
- DPLL code cleanups
- Cleanup PXP plane protection checks
- Improve display debug msgs
- PSR selective fetch fixes/improvements
- DP MST fixes
- Xe2LPD FBC restrictions removed
- DGFX uses direct VBT pin mapping
- more MTL WAs
- fix MTL eDP bug
- eliminate use of kmap_atomic
habanalabs:
- sysfs entry to identify a device minor id with debugfs path
- sysfs entry to expose device module id
- add signed device info retrieval through INFO ioctl
- add Gaudi2C device support
- pcie reset prepare/done hooks
msm:
- Add support for SDM670, SM8650
- Handle the CFG interconnect to fix the obscure hangs / timeouts
- Kconfig fix for QMP dependency
- use managed allocators
- DPU: SDM670, SM8650 support
- DPU: Enable SmartDMA on SM8350 and SM8450
- DP: enable runtime PM support
- GPU: add metadata UAPI
- GPU: move devcoredumps to GPU device
- GPU: convert to drm_exec
ivpu:
- update FW API
- new debugfs file
- a new NOP job submission test mode
- improve suspend/resume
- PM improvements
- MMU PT optimizations
- firmware profile frequency support
- support for uncached buffers
- switch to gem shmem helpers
- replace kthread with threaded irqs
rockchip:
- rk3066_hdmi: convert to atomic
- vop2: support nv20 and nv30
- rk3588 support
mediatek:
- use devm_platform_ioremap_resource
- stop using iommu_present
- MT8188 VDOSYS1 display support
panfrost:
- PM improvements
- improve interrupt handling as poweroff
qaic:
- allow to run with single MSI
- support host/device time sync
- switch to persistent DRM devices
exynos:
- fix potential error pointer dereference
- fix wrong error checking
- add missing call to drm_atomic_helper_shutdown
omapdrm:
- dma-fence lockdep annotation fix
tidss:
- dma-fence lockdep annotation fix
- support for AM62A7
v3d:
- BCM2712 - rpi5 support
- fdinfo + gputop support
- uapi for CPU job handling
virtio-gpu:
- add context debug name"
* tag 'drm-next-2024-01-10' of git://anongit.freedesktop.org/drm/drm: (2340 commits)
drm/amd/display: Allow z8/z10 from driver
drm/amd/display: fix bandwidth validation failure on DCN 2.1
drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well
drm/amd/display: Move fixpt_from_s3132 to amdgpu_dm
drm/amd/display: Fix recent checkpatch errors in amdgpu_dm
Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"
drm/amd/display: avoid stringop-overflow warnings for dp_decide_lane_settings()
drm/amd/display: Fix power_helpers.c codestyle
drm/amd/display: Fix hdcp_log.h codestyle
drm/amd/display: Fix hdcp2_execution.c codestyle
drm/amd/display: Fix hdcp_psp.h codestyle
drm/amd/display: Fix freesync.c codestyle
drm/amd/display: Fix hdcp_psp.c codestyle
drm/amd/display: Fix hdcp1_execution.c codestyle
drm/amd/pm/smu7: fix a memleak in smu7_hwmgr_backend_init
drm/amdkfd: Fix iterator used outside loop in 'kfd_add_peer_prop()'
drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'
drm/amdkfd: Confirm list is non-empty before utilizing list_first_entry in kfd_topology.c
drm/amdgpu: Fix '*fw' from request_firmware() not released in 'amdgpu_ucode_request()'
drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()'
...
|
|
This is needed for killing the sched.h dependency on rcupdate.h, and
pid.h is a better place for this code anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.
This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.
However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.
In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
|
|
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
|
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
|
|
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Robert Foss <rfoss@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230714174545.4056287-1-robh@kernel.org
|
|
Clear all assignments of struct drm_driver's fd/handle callbacks to
drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These
functions are called by default. Add a TODO item to convert vmwgfx
to the defaults as well.
v2:
* remove TODO item (Zack)
* also update amdgpu's amdgpu_partition_driver
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> # qaic
Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-3-tzimmermann@suse.de
|
|
Replace all drm-shmem locks with a GEM reservation lock. This makes locks
consistent with dma-buf locking convention where importers are responsible
for holding reservation lock for all operations performed over dma-bufs,
preventing deadlock between dma-buf importers and exporters.
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230529223935.2672495-7-dmitry.osipenko@collabora.com
|
|
Backmerging into drm-misc-next to get commit 2c1c7ba457d4
("drm/amdgpu: support partition drm devices"), which is required to fix
commit 0adec22702d4 ("drm: Remove struct drm_driver.gem_prime_mmap").
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
All drivers initialize this field with drm_gem_prime_mmap(). Call
the function directly and remove the field. Simplifies the code and
resolves a long-standing TODO item.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230613150441.17720-3-tzimmermann@suse.de
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230507162616.1368908-26-u.kleine-koenig@pengutronix.de
|
|
The drm sched entity must be flushed before finishing, to account for
jobs potentially still in flight at that time.
Lima did not do this flush until now, so switch the destroy call to the
drm_sched_entity_destroy() wrapper which will take care of that.
This fixes a regression on lima which started since the rework in
commit 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
where some specific types of applications may hang indefinitely.
Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606143247.433018-1-nunes.erico@gmail.com
|
|
This reverts commit bccafec957a5c4b22ac29e53a39e82d0a0008348.
This is due to the depend commit has been reverted on upstream:
commit baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"")
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230404002601.24136-4-yq882255@163.com
|
|
This reverts commit 87767de835edf527b879a363d518c33da68adb81.
This is due to the depend commit has been reverted on upstream:
commit baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"")
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230404002601.24136-3-yq882255@163.com
|
|
This reverts commit 4a66f3da99dcb4dcbd28544110636b50adfb0f0d.
This is due to the depend commit has been reverted on upstream:
commit baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"")
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230404002601.24136-2-yq882255@163.com
|
|
This exposes an accumulated active time per client via the fdinfo
infrastructure per execution engine, following
Documentation/gpu/drm-usage-stats.rst.
In lima, the exposed execution engines are gp and pp.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230312233052.21095-4-nunes.erico@gmail.com
|
|
To track if fds are pointing to the same execution context and export
the expected information to fdinfo, similar to what is done in other
drivers.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230312233052.21095-3-nunes.erico@gmail.com
|
|
lima maintains a context manager per drm_file, similar to amdgpu.
In order to account for the complete usage per drm_file, all of the
associated contexts need to be considered.
Previously released contexts also need to be accounted for but their
drm_sched_entity info is gone once they get released, so account for it
in the ctx_mgr.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230312233052.21095-2-nunes.erico@gmail.com
|
|
Smatch reports:
drivers/gpu/drm/lima/lima_drv.c:396 lima_pdev_probe() warn:
missing unwind goto?
Store return value in err and goto 'err_out0' which has
lima_sched_slab_fini() before returning.
Fixes: a1d2a6339961 ("drm/lima: driver for ARM Mali4xx GPUs")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230314052711.4061652-1-harshit.m.mogalapalli@oracle.com
|
|
As lima_gem_add_deps() performs the same steps as
drm_sched_job_add_syncobj_dependency(), replace the open-coded
implementation in Lima in order to simply use the DRM function.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20230224214133.411966-1-mcanal@igalia.com
|
|
This reverts commit 67b7836d4458790f1261e31fe0ce3250989784f0.
The locking appears incomplete. A caller of SHMEM helper's pin
function never acquires the dma-buf reservation lock. So we get
WARNING: CPU: 3 PID: 967 at drivers/gpu/drm/drm_gem_shmem_helper.c:243 drm_gem_shmem_pin+0x42/0x90 [drm_shmem_helper]
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230228152612.19971-1-tzimmermann@suse.de
|
|
Replace all drm-shmem locks with a GEM reservation lock. This makes locks
consistent with dma-buf locking convention where importers are responsible
for holding reservation lock for all operations performed over dma-bufs,
preventing deadlock between dma-buf importers and exporters.
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://lore.kernel.org/all/20230108210445.3948344-8-dmitry.osipenko@collabora.com/
|
|
Linux 6.1-rc6
This is needed for drm-misc-next and tegra.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Commit d8c32d3971e4 ("drm/lima: Migrate to dev_pm_opp_set_config()")
introduced a regression as it may undo the clk_names setting in case
the optional regulator is missing. This resulted in test and performance
regressions with lima.
Restore the old behavior where clk_names is set separately so it is not
undone in case of a missing optional regulator.
Fixes: d8c32d3971e4 ("drm/lima: Migrate to dev_pm_opp_set_config()")
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027073200.3885839-1-nunes.erico@gmail.com
|
|
The new common dma-buf locking convention will require buffer importers
to hold the reservation lock around mapping operations. Make DRM GEM core
to take the lock around the vmapping operations and update DRM drivers to
use the locked functions for the case where DRM core now holds the lock.
This patch prepares DRM core and drivers to the common dynamic dma-buf
locking convention.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-4-dmitry.osipenko@collabora.com
|
|
The OPP core now provides a unified API for setting all configuration
types, i.e. dev_pm_opp_set_config().
Lets start using it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Make dev_pm_opp_set_regulators() accept a NULL terminated list of names
instead of making the callers keep the two parameters in sync, which
creates an opportunity for bugs to get in.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Steven Price <steven.price@arm.com> # panfrost
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Instead of distingting between shared and exclusive fences specify
the fence usage while adding fences.
Rework all drivers to use this interface instead and deprecate the old one.
v2: some kerneldoc comments suggested by Daniel
v3: fix a missing case in radeon
v4: rebase on nouveau changes, fix lockdep and temporary disable warning
v5: more documentation updates
v6: separate internal dma_resv changes from this patch, avoids to
disable warning temporary, rebase on upstream changes
v7: fix missed case in lima driver, minimize changes to i915_gem_busy_ioctl
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-3-christian.koenig@amd.com
|