| Age | Commit message (Collapse) | Author | Files | Lines |
|
variables related to memory alloc/free
commit 52b330799e2d6f825ae2bb74662ec1b10eb954bb upstream.
Exynos Virtual Display driver performs memory alloc/free operations
without lock protection, which easily causes concurrency problem.
For example, use-after-free can occur in race scenario like this:
```
CPU0 CPU1 CPU2
---- ---- ----
vidi_connection_ioctl()
if (vidi->connection) // true
drm_edid = drm_edid_alloc(); // alloc drm_edid
...
ctx->raw_edid = drm_edid;
...
drm_mode_getconnector()
drm_helper_probe_single_connector_modes()
vidi_get_modes()
if (ctx->raw_edid) // true
drm_edid_dup(ctx->raw_edid);
if (!drm_edid) // false
...
vidi_connection_ioctl()
if (vidi->connection) // false
drm_edid_free(ctx->raw_edid); // free drm_edid
...
drm_edid_alloc(drm_edid->edid)
kmemdup(edid); // UAF!!
...
```
To prevent these vulns, at least in vidi_context, member variables related
to memory alloc/free should be protected with ctx->lock.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4cb1b327135dddf3d0ec2544ea36ed05ba2252bc ]
xe_guc_print_info is void-returning, but the function pointer it is
assigned to expects an int-returning function, leading to the following
CFI error:
[ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe]
(target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a)
Fix this by updating xe_guc_print_info to return an integer.
Fixes: e15826bb3c2c ("drm/xe/guc: Refactor GuC debugfs initialization")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: George D Sworo <george.d.sworo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com
(cherry picked from commit dd8ea2f2ab71b98887fdc426b0651dbb1d1ea760)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bb36170d959fad7f663f91eb9c32a84dd86bef2b ]
Restrict D3Cold disablement for BMG to unsupported NUC platforms,
instead of disabling it on all platforms.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 3e331a6715ee ("drm/xe/pm: Temporarily disable D3Cold on BMG")
Link: https://patch.msgid.link/20260123173238.1642383-1-karthik.poosa@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 39125eaf8863ab09d70c4b493f58639b08d5a897)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7ee9b3e091c63da71e15c72003f1f07e467f5158 ]
The topology query helper advanced the user pointer by the size
of the pointer, not the size of the structure. This can misalign
the output blob and corrupt the following mask. Fix the increment
to use sizeof(*topo).
There is no issue currently, as sizeof(*topo) happens to be equal
to sizeof(topo) on 64-bit systems (both evaluate to 8 bytes).
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260130043907.465128-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit c2a6859138e7f73ad904be17dd7d1da6cc7f06b3)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0e0c8f4d16de92520623aa1ea485cadbf64e6929 ]
The mgag200_bmc_stop_scanout() function is called by the .atomic_disable()
handler for the MGA G200 VGA BMC encoder. This function performs a few
register writes to inform the BMC of an upcoming mode change, and then
polls to wait until the BMC actually stops.
The polling is implemented using a busy loop with udelay() and an iteration
timeout of 300, resulting in the function blocking for 300 milliseconds.
The function gets called ultimately by the output_poll_execute work thread
for the DRM output change polling thread of the mgag200 driver:
kworker/0:0-mm_ 3528 [000] 4555.315364:
ffffffffaa0e25b3 delay_halt.part.0+0x33
ffffffffc03f6188 mgag200_bmc_stop_scanout+0x178
ffffffffc087ae7a disable_outputs+0x12a
ffffffffc087c12a drm_atomic_helper_commit_tail+0x1a
ffffffffc03fa7b6 mgag200_mode_config_helper_atomic_commit_tail+0x26
ffffffffc087c9c1 commit_tail+0x91
ffffffffc087d51b drm_atomic_helper_commit+0x11b
ffffffffc0509694 drm_atomic_commit+0xa4
ffffffffc05105e8 drm_client_modeset_commit_atomic+0x1e8
ffffffffc0510ce6 drm_client_modeset_commit_locked+0x56
ffffffffc0510e24 drm_client_modeset_commit+0x24
ffffffffc088a743 __drm_fb_helper_restore_fbdev_mode_unlocked+0x93
ffffffffc088a683 drm_fb_helper_hotplug_event+0xe3
ffffffffc050f8aa drm_client_dev_hotplug+0x9a
ffffffffc088555a output_poll_execute+0x29a
ffffffffa9b35924 process_one_work+0x194
ffffffffa9b364ee worker_thread+0x2fe
ffffffffa9b3ecad kthread+0xdd
ffffffffa9a08549 ret_from_fork+0x29
On a server running ptp4l with the mgag200 driver loaded, we found that
ptp4l would sometimes get blocked from execution because of this busy
waiting loop.
Every so often, approximately once every 20 minutes -- though with large
variance -- the output_poll_execute() thread would detect some sort of
change that required performing a hotplug event which results in attempting
to stop the BMC scanout, resulting in a 300msec delay on one CPU.
On this system, ptp4l was pinned to a single CPU. When the
output_poll_execute() thread ran on that CPU, it blocked ptp4l from
executing for its 300 millisecond duration.
This resulted in PTP service disruptions such as failure to send a SYNC
message on time, failure to handle ANNOUNCE messages on time, and clock
check warnings from the application. All of this despite the application
being configured with FIFO_RT and a higher priority than the background
workqueue tasks. (However, note that the kernel did not use
CONFIG_PREEMPT...)
It is unclear if the event is due to a faulty VGA connection, another bug,
or actual events causing a change in the connection. At least on the system
under test it is not a one-time event and consistently causes disruption to
the time sensitive applications.
The function has some helpful comments explaining what steps it is
attempting to take. In particular, step 3a and 3b are explained as such:
3a - The third step is to verify if there is an active scan. We are
waiting on a 0 on remhsyncsts (<XSPAREREG<0>.
3b - This step occurs only if the remove is actually scanning. We are
waiting for the end of the frame which is a 1 on remvsyncsts
(<XSPAREREG<1>).
The actual steps 3a and 3b are implemented as while loops with a
non-sleeping udelay(). The first step iterates while the tmp value at
position 0 is *not* set. That is, it keeps iterating as long as the bit is
zero. If the bit is already 0 (because there is no active scan), it will
iterate the entire 300 attempts which wastes 300 milliseconds in total.
This is opposite of what the description claims.
The step 3b logic only executes if we do not iterate over the entire 300
attempts in the first loop. If it does trigger, it is trying to check and
wait for a 1 on the remvsyncsts. However, again the condition is actually
inverted and it will loop as long as the bit is 1, stopping once it hits
zero (rather than the explained attempt to wait until we see a 1).
Worse, both loops are implemented using non-sleeping waits which spin
instead of allowing the scheduler to run other processes. If the kernel is
not configured to allow arbitrary preemption, it will waste valuable CPU
time doing nothing.
There does not appear to be any documentation for the BMC register
interface, beyond what is in the comments here. It seems more probable that
the comment here is correct and the implementation accidentally got
inverted from the intended logic.
Reading through other DRM driver implementations, it does not appear that
the .atomic_enable or .atomic_disable handlers need to delay instead of
sleep. For example, the ast_astdp_encoder_helper_atomic_disable() function
calls ast_dp_set_phy_sleep() which uses msleep(). The "atomic" in the name
is referring to the atomic modesetting support, which is the support to
enable atomic configuration from userspace, and not to the "atomic context"
of the kernel. There is no reason to use udelay() here if a sleep would be
sufficient.
Replace the while loops with a read_poll_timeout() based implementation
that will sleep between iterations, and which stops polling once the
condition is met (instead of looping as long as the condition is met). This
aligns with the commented behavior and avoids blocking on the CPU while
doing nothing.
Note the RREG_DAC is implemented using a statement expression to allow
working properly with the read_poll_timeout family of functions. The other
RREG_<TYPE> macros ought to be cleaned up to have better semantics, and
several places in the mgag200 driver could make use of RREG_DAC or similar
RREG_* macros should likely be cleaned up for better semantics as well, but
that task has been left as a future cleanup for a non-bugfix.
Fixes: 414c45310625 ("mgag200: initial g200se driver (v2)")
Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260202-jk-mgag200-fix-bad-udelay-v2-1-ce1e9665987d@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8f959d37c1f2efec6dac55915ee82302e98101fb ]
Some shimmer/colorful points appears when using the steamOS color
pipeline for HDR on gaming with DCN32. These points look like black
values being wrongly mapped to red/blue/green values. It was caused
because the number of hw points in regular LUTs and in a shaper LUT was
treated as the same.
DCN3+ regular LUTs have 257 bases and implicit deltas (i.e. HW
calculates them), but shaper LUT is a special case: it has 256 bases and
256 deltas, as in DCN1-2 regular LUTs, and outputs 14-bit values.
Fix that by setting by decreasing in 1 the number of HW points computed
in the LUT segmentation so that shaper LUT (i.e. fixpoint == true) keeps
the same DCN10 CM logic and regular LUTs go with `hw_points + 1`.
CC: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Fixes: 4d5fd3d08ea9 ("drm/amd/display: PQ tail accuracy")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5006505b19a2119e71c008044d59f6d753c858b9)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f377ea0561c9576cdb7e3890bcf6b8168d455464 ]
This reverts commit bc6d54ac7e7436721a19443265f971f890c13cc5.
The workload profile needs to be in the default state when
the dc idle optimizaion state is entered. However, when
jobs come in for video or GFX or compute, the profile may
be set to a non-default profile resulting in the dc idle
optimizations not taking affect and resulting in higher
power usage. As such we need to pause the workload profile
changes during this transition. When this patch was originally
committed, it caused a regression with a Dell U3224KB display,
but no other problems were reported at the time. When it
was reapplied (this patch) to address increased power usage, it
seems to have caused additional regressions. This change seems
to have a number of side affects (audio issues, stuttering,
etc.). I suspect the pause should only happen when all displays
are off or in static screen mode, but I think this call site
gets called more often than that which results in idle state
entry more often than intended. For now revert.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4894
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4717
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4725
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4517
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4806
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: Roman Li <Roman.Li@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1412482b714358ffa30d38fd3dd0b05795163648)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0de604d0357d0d22cbf03af1077d174b641707b6 ]
During Mode 1 reset, the ASIC undergoes a reset cycle and becomes
temporarily inaccessible via PCIe. Any attempt to access MMIO registers
during this window (e.g., from interrupt handlers or other driver threads)
can result in uncompleted PCIe transactions, leading to NMI panics or
system hangs.
To prevent this, set the `no_hw_access` flag to true immediately after
triggering the reset. This signals other driver components to skip
register accesses while the device is offline.
A memory barrier `smp_mb()` is added to ensure the flag update is
globally visible to all cores before the driver enters the sleep/wait
state.
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7edb503fe4b6d67f47d8bb0dfafb8e699bb0f8a4)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
CalculatePrefetchSchedule()
[ Upstream commit f54a91f5337cd918eb86cf600320d25b6cfd8209 ]
After an innocuous optimization change in clang-22,
dml30_ModeSupportAndSystemConfigurationFull() is over the 2048 byte
stack limit for display_mode_vba_30.c.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3529:6: warning: stack frame size (2096) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
3529 | void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
| ^
With clang-21, this function was already close to the limit:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3529:6: warning: stack frame size (1912) exceeds limit (1586) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
3529 | void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
| ^
CalculatePrefetchSchedule() has a large number of parameters, which must
be passed on the stack. Most of the parameters between the two callsites
are the same, so they can be accessed through the existing mode_lib
pointer, instead of being passed as explicit arguments. Doing this
reduces the stack size of dml30_ModeSupportAndSystemConfigurationFull()
from 2096 bytes to 1912 bytes with clang-22.
Closes: https://github.com/ClangBuiltLinux/linux/issues/2117
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b20b3fc4210f83089f835cdb91deec4b0778761a)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 8302d0afeaec0bc57d951dd085e0cffe997d4d18 upstream.
The r570 firmware with certain GPUs (at least RTX6000) needs this
flag to reflect the suspend vs runtime PM state of the driver.
This uses that info to set the correct flags to the firmware.
This fixes a regression on RTX6000 and other GPUs since r570 firmware
was enabled.
Fixes: 53dac0623853 ("drm/nouveau/gsp: add support for 570.144")
Cc: <stable@vger.kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260203052431.2219998-4-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90caca3b7264cc3e92e347b2004fff4e386fc26e upstream.
There are two layers of sequence numbers, one at the msg level
and one at the rpc level.
570 firmware started asserting on the sequence numbers being
in the right order, and we would see nocat records with asserts
in them.
Add the rpc level sequence number support.
Fixes: 53dac0623853 ("drm/nouveau/gsp: add support for 570.144")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260203052431.2219998-2-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8f8a4dce64013737701d13565cf6107f42b725ea upstream.
This is just refactoring to allow the lower layers to distinguish
between suspend and runtime suspend.
GSP 570 needs to set a flag with the GPU is going into GCOFF,
this flag taken from the opengpu driver is set whenever runtime
suspend is enterning GCOFF but not for normal suspend paths.
This just refactors the code, a subsequent patch use the information.
Fixes: 53dac0623853 ("drm/nouveau/gsp: add support for 570.144")
Cc: <stable@vger.kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260203052431.2219998-3-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 243b467dea1735fed904c2e54d248a46fa417a2d upstream.
This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
This commit was erroneously applied again after commit 0ab5d711ec74
("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
removed it, leading to very hard to debug crashes, when used with a system with two
AMD GPUs of which only one supports ASPM.
Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
Link: https://github.com/acpica/acpica/issues/1060
Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 97a9689300eb2b393ba5efc17c8e5db835917080)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1478a34470bf4755465d29b348b24a610bccc180 upstream.
commit f81cd793119e ("drm/amd/amdgpu: Fix MES init sequence") caused
a dependency on new enough MES firmware to use amdgpu. This was fixed
on most gfx11 and gfx12 hardware with commit 0180e0a5dd5c
("drm/amdgpu/mes: add compatibility checks for set_hw_resource_1"), but
this left out that GC 11.0.4 had breakage at MES 0x51.
Bump the requirement to 0x52 instead.
Reported-by: danijel@nausys.com
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4576
Fixes: f81cd793119e ("drm/amd/amdgpu: Fix MES init sequence")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c2d2ccc85faf8cc6934d50c18e43097eb453ade2)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6c65db809796717f0a96cf22f80405dbc1a31a4b upstream.
This reverts commit 604826acb3f53c6648a7ee99a3914ead680ab7fb.
Apparently there is more to supporting atomic modesetting than
providing atomic_(check|commit) callbacks. Before this revert:
WARNING: [] drivers/gpu/drm/drm_plane.c:389 at .__drm_universal_plane_init+0x13c/0x794 [drm], CPU#1: modprobe/1790
BUG: Kernel NULL pointer dereference on read at 0x00000000
.drm_atomic_get_plane_state+0xd4/0x210 [drm] (unreliable)
.drm_client_modeset_commit_atomic+0xf8/0x338 [drm]
.drm_client_modeset_commit_locked+0x80/0x260 [drm]
.drm_client_modeset_commit+0x40/0x7c [drm]
.__drm_fb_helper_restore_fbdev_mode_unlocked.part.0+0xfc/0x108 [drm_kms_helper]
.drm_fb_helper_set_par+0x8c/0xb8 [drm_kms_helper]
.fbcon_init+0x31c/0x618
[...]
.__drm_fb_helper_initial_config_and_unlock+0x474/0x7f4 [drm_kms_helper]
.drm_fbdev_client_hotplug+0xb0/0x120 [drm_client_lib]
.drm_client_register+0x88/0xe4 [drm]
.drm_fbdev_client_setup+0x12c/0x19b4 [drm_client_lib]
.drm_client_setup+0x15c/0x18c [drm_client_lib]
.nouveau_drm_probe+0x19c/0x268 [nouveau]
Fixes: 604826acb3f5 ("drm/nouveau/disp: Set drm_mode_config_funcs.atomic_(check|commit)")
Reported-by: John Ogness <john.ogness@linutronix.de>
Closes: https://lore.kernel.org/lkml/87ldhf1prw.fsf@jogness.linutronix.de
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Tested-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260130113230.2311221-1-john.ogness@linutronix.de
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b1defcdc4457649db236415ee618a7151e28788c upstream.
The EXEC_COUNT field must be > 0. In the gfx shadow
handling we always emit a cond_exec packet after the gfx_shadow
packet, but the EXEC_COUNT never gets patched. This leads
to a hang when we try and reset queues on gfx11 APUs.
Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ba205ac3d6e83f56c4f824f23f1b4522cb844ff3)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8b1ecc9377bc641533cd9e76dfa3aee3cd04a007 upstream.
On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
ih2 interrupt ring buffers are not initialized. This is by design, as
these secondary IH rings are only available on discrete GPUs. See
vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
AMD_IS_APU is set.
However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
get the timestamp of the last interrupt entry. When retry faults are
enabled on APUs (noretry=0), this function is called from the SVM page
fault recovery path, resulting in a NULL pointer dereference when
amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].
The crash manifests as:
BUG: kernel NULL pointer dereference, address: 0000000000000004
RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
Call Trace:
amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
amdgpu_ih_process+0x84/0x100 [amdgpu]
This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
noretry=1 to noretry=0, enabling retry fault handling and thus
exercising the buggy code path.
Fix this by adding a check for ih1.ring_size before attempting to use
it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu:
Rework retry fault removal"). This is needed if the hardware doesn't
support secondary HW IH rings.
v2: additional updates (Alex)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6ce8d536c80aa1f059e82184f0d1994436b1d526)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dfd64f6e8cd7b59238cdaf8af7a55711f13a89db upstream.
Kernel gfx queues do not need to be reinitialized or
remapped after a reset. Align with gfx11.
v2: preserve init and remap for MMIO case.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a6d6ed694d72b66b0ed7a483d5effa01acd3951)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9077d32a4b570fa20500aa26e149981c366c965d upstream.
wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a2918f958d3f677ea93c0ac257cb6ba69b7abb7c)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3eb46fbb601f9a0b4df8eba79252a0a85e983044 upstream.
Kernel gfx queues do not need to be reinitialized or
remapped after a reset. This fixes queue reset failures
on APUs.
v2: preserve init and remap for MMIO case.
Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b340ff216fdabfe71ba0cdd47e9835a141d08e10)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b1f810471c6a6bd349f7f9f2f2fed96082056d46 upstream.
wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1f16866bdb1daed7a80ca79ae2837a9832a74fbc)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cc4f433b14e05eaa4a98fd677b836e9229422387 upstream.
wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e80b1d1aa1073230b6c25a1a72e88f37e425ccda)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7fbff9e7622a00c2b53cb14df481916f0019742 upstream.
The reference clock is supposed to be 100Mhz, but it
appears to actually be slightly lower (99.81Mhz).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14451
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 637fee3954d4bd509ea9d95ad1780fc174489860)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 239d0ccf567c3b09aed58eb88cd3376af37aaf14 upstream.
v1:
resolve the issue where some freq frequencies cannot be set correctly
due to insufficient floating-point precision.
v2:
patch this convert on 'max' value only.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 53868dd8774344051999c880115740da92f97feb)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c764b7af15289051718b4859a67f9a3bc69d3fb2 upstream.
v1:
resolve the issue where some freq frequencies cannot be set correctly
due to insufficient floating-point precision.
v2:
patch this convert on 'max' value only.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6194f60c707e3878e120adeb36997075664d8429)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e535c23513c63f02f67e3e09e0787907029efeaf upstream.
Make sure to drop the reference taken to the DDC device during probe on
probe failure (e.g. probe deferral) and on driver unbind.
Fixes: fcbc51e54d2a ("staging: drm/imx: Add support for Television Encoder (TVEv2)")
Cc: stable@vger.kernel.org # 3.10
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20251030163456.15807-1-johan@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dedb897f11c5d7e32c0e0a0eff7cec23a8047167 upstream.
The hw clock gating register sequence consists of register value pairs
that are written to the GPU during initialisation.
The a690 hwcg sequence has two GMU registers in it that used to amount
to random writes in the GPU mapping, but since commit 188db3d7fe66
("drm/msm/a6xx: Rebase GMU register offsets") they trigger a fault as
the updated offsets now lie outside the mapping. This in turn breaks
boot of machines like the Lenovo ThinkPad X13s.
Note that the updates of these GMU registers is already taken care of
properly since commit 40c297eb245b ("drm/msm/a6xx: Set GMU CGC
properties on a6xx too"), but for some reason these two entries were
left in the table.
Fixes: 5e7665b5e484 ("drm/msm/adreno: Add Adreno A690 support")
Cc: stable@vger.kernel.org # 6.5
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Konrad Dybcio <konradybcio@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Fixes: 188db3d7fe66 ("drm/msm/a6xx: Rebase GMU register offsets")
Patchwork: https://patchwork.freedesktop.org/patch/695778/
Message-ID: <20251221164552.19990-1-johan@kernel.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
(cherry picked from commit dcbd2f8280eea2c965453ed8c3c69d6f121e950b)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b0581f6ab952ffd135ca4402d2ee3da641538d6b upstream.
Tyr needs `CONFIG_COMMON_CLK` to build:
error[E0432]: unresolved import `kernel::clk::Clk`
--> drivers/gpu/drm/tyr/driver.rs:3:5
|
3 | use kernel::clk::Clk;
| ^^^^^^^^^^^^^^^^ no `Clk` in `clk`
error[E0432]: unresolved import `kernel::clk::OptionalClk`
--> drivers/gpu/drm/tyr/driver.rs:4:5
|
4 | use kernel::clk::OptionalClk;
| ^^^^^^^^^^^^^^^^^^^^^^^^ no `OptionalClk` in `clk`
Thus add the dependency to fix it.
Fixes: cf4fd52e3236 ("rust: drm: Introduce the Tyr driver for Arm Mali GPUs")
Cc: stable@vger.kernel.org
Acked-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260124160948.67508-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 051be49133971076717846e2a04c746ab3476282 upstream.
It looks I mistyped CS_DEBUG_MODE2 as CS_DEBUG_MODE1 when adding the
workaround. Fix it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: ca33cd271ef9 ("drm/xe/xelp: Add Wa_18022495364")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.18+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260116095040.49335-1-tvrtko.ursulin@igalia.com
(cherry picked from commit 7fe6cae2f7fad2b5166b0fc096618629f9e2ebcb)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
drm_gem_change_handle_ioctl()
commit 12f15d52d38ac53f7c70ea3d4b3d76afed04e064 upstream.
Since GEM bo handles are u32 in the uapi and the internal implementation
uses idr_alloc() which uses int ranges, passing a new handle larger than
INT_MAX trivially triggers a kernel warning:
idr_alloc():
...
if (WARN_ON_ONCE(start < 0))
return -EINVAL;
...
Fix it by rejecting new handles above INT_MAX and at the same time make
the end limit calculation more obvious by moving into int domain.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reported-by: Zhi Wang <wangzhi@stu.xidian.edu.cn>
Fixes: 53096728b891 ("drm: Add DRM prime interface to reassign GEM handle")
Cc: David Francis <David.Francis@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org> # v6.18+
Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20260123141540.76540-1-tvrtko.ursulin@igalia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8a44241b0b83a6047c5448da1fff03fcc29496b5 ]
After a successful auxiliary_device_init(), aux_dev->dev.release
(xe_nvm_release_dev()) is responsible for the kfree(nvm). When
there is failure with auxiliary_device_add(), driver will call
auxiliary_device_uninit(), which call put_device(). So that the
.release callback will be triggered to free the memory associated
with the auxiliary_device.
Move the kfree(nvm) into the auxiliary_device_init() failure path
and remove the err goto path to fix below error.
"
[ 13.232905] ==================================================================
[ 13.232911] BUG: KASAN: double-free in xe_nvm_init+0x751/0xf10 [xe]
[ 13.233112] Free of addr ffff888120635000 by task systemd-udevd/273
[ 13.233120] CPU: 8 UID: 0 PID: 273 Comm: systemd-udevd Not tainted 6.19.0-rc2-lgci-xe-kernel+ #225 PREEMPT(voluntary)
...
[ 13.233125] Call Trace:
[ 13.233126] <TASK>
[ 13.233127] dump_stack_lvl+0x7f/0xc0
[ 13.233132] print_report+0xce/0x610
[ 13.233136] ? kasan_complete_mode_report_info+0x5d/0x1e0
[ 13.233139] ? xe_nvm_init+0x751/0xf10 [xe]
...
"
v2: drop err goto path. (Alexander)
Fixes: 7926ba2143d8 ("drm/xe: defer free of NVM auxiliary container to device release callback")
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Suggested-by: Brian Nguyen <brian3.nguyen@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260120183239.2966782-7-shuicheng.lin@intel.com
(cherry picked from commit a3187c0c2bbd947ffff97f90d077ac88f9c2a215)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2da8fbb8f1c17129a08c1e0e42c71eabdca76062 ]
Move nvm teardown to a devm-managed action registered from xe_nvm_init().
This ensures the auxiliary NVM device is deleted on probe failure and
device detach without requiring explicit calls from remove paths.
As part of this, drop xe_nvm_fini() from xe_device_remove() and from the
survivability sysfs teardown, and remove the public xe_nvm_fini() API from
the header.
This is to fix below warn message when there is probe failure after
xe_nvm_init(), then xe_device_probe() is called again:
"
[ 207.318152] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/xe.nvm.768'
[ 207.318157] CPU: 5 UID: 0 PID: 10261 Comm: modprobe Tainted: G B W 6.19.0-rc2-lgci-xe-kernel+ #223 PREEMPT(voluntary)
[ 207.318160] Tainted: [B]=BAD_PAGE, [W]=WARN
[ 207.318161] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
[ 207.318163] Call Trace:
[ 207.318163] <TASK>
[ 207.318165] dump_stack_lvl+0xa0/0xc0
[ 207.318170] dump_stack+0x10/0x20
[ 207.318171] sysfs_warn_dup+0xd5/0x110
[ 207.318175] sysfs_create_dir_ns+0x1f6/0x280
[ 207.318177] ? __pfx_sysfs_create_dir_ns+0x10/0x10
[ 207.318179] ? lock_acquire+0x1a4/0x2e0
[ 207.318182] ? __kasan_check_read+0x11/0x20
[ 207.318185] ? do_raw_spin_unlock+0x5c/0x240
[ 207.318187] kobject_add_internal+0x28d/0x8e0
[ 207.318189] kobject_add+0x11f/0x1f0
[ 207.318191] ? __pfx_kobject_add+0x10/0x10
[ 207.318193] ? lockdep_init_map_type+0x4b/0x230
[ 207.318195] ? get_device_parent.isra.0+0x43/0x4c0
[ 207.318197] ? kobject_get+0x55/0xf0
[ 207.318199] device_add+0x2d7/0x1500
[ 207.318201] ? __pfx_device_add+0x10/0x10
[ 207.318203] ? lockdep_init_map_type+0x4b/0x230
[ 207.318205] __auxiliary_device_add+0x99/0x140
[ 207.318208] xe_nvm_init+0x7a2/0xef0 [xe]
[ 207.318333] ? xe_devcoredump_init+0x80/0x110 [xe]
[ 207.318452] ? __devm_add_action+0x82/0xc0
[ 207.318454] ? fs_reclaim_release+0xc0/0x110
[ 207.318457] xe_device_probe+0x17dd/0x2c40 [xe]
[ 207.318574] ? __pfx___drm_dev_dbg+0x10/0x10
[ 207.318576] ? add_dr+0x180/0x220
[ 207.318579] ? __pfx___drmm_mutex_release+0x10/0x10
[ 207.318582] ? __pfx_xe_device_probe+0x10/0x10 [xe]
[ 207.318697] ? xe_pm_init_early+0x33a/0x410 [xe]
[ 207.318850] xe_pci_probe+0x936/0x1250 [xe]
[ 207.318999] ? lock_acquire+0x1a4/0x2e0
[ 207.319003] ? __pfx_xe_pci_probe+0x10/0x10 [xe]
[ 207.319151] local_pci_probe+0xe6/0x1a0
[ 207.319154] pci_device_probe+0x523/0x840
[ 207.319157] ? __pfx_pci_device_probe+0x10/0x10
[ 207.319159] ? sysfs_do_create_link_sd.isra.0+0x8c/0x110
[ 207.319162] ? sysfs_create_link+0x48/0xc0
...
"
Fixes: c28bfb107dac ("drm/xe/nvm: add on-die non-volatile memory device")
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260120183239.2966782-6-shuicheng.lin@intel.com
(cherry picked from commit 11035eab1b7d88daa7904440046e64d3810b1ca1)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c1ed856c09d0d730c2f63bbb757cb6011db148f9 ]
Move pci_dev_put() after pci_dbg() to avoid using pdev after dropping its
reference.
Fixes: 2674f1ef29f46 ("drm/xe/configfs: Block runtime attribute changes")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260121173750.3090907-2-shuicheng.lin@intel.com
(cherry picked from commit 63b33604365bdca43dee41bab809da2230491036)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ee8d07cd5730038e33bf5e551448190bbd480eb8 ]
The power state check in amdgpu_dpm_set_powergating_by_smu() is done
before acquiring the pm mutex, leading to a race condition where:
1. Thread A checks state and thinks no change is needed
2. Thread B acquires mutex and modifies the state
3. Thread A returns without updating state, causing inconsistency
Fix this by moving the mutex lock before the power state check,
ensuring atomicity of the state check and modification.
Fixes: 6ee27ee27ba8 ("drm/amd/pm: avoid duplicate powergate/ungate setting")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7a3fbdfd19ec5992c0fc2d0bd83888644f5f2f38)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c73a8917b31e8ddbd53cc248e17410cec27f8f58 ]
For parallel exec queues, xe_exec_ioctl() copied the batch buffer address
array from userspace without checking num_batch_buffer.
If user creates a sync-only exec that doesn't use the address field, the
exec will fail with -EFAULT.
Add num_batch_buffer check to skip the copy, and the exec could be executed
successfully.
Here is the sync-only exec:
struct drm_xe_exec exec = {
.extensions = 0,
.exec_queue_id = qid,
.num_syncs = 1,
.syncs = (uintptr_t)&sync,
.address = 0, /* ignored for sync-only */
.num_batch_buffer = 0, /* sync-only */
};
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260122214053.3189366-2-shuicheng.lin@intel.com
(cherry picked from commit 4761791c1e736273d612ff564f318bfbbb04fa4e)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6f287b1c8d0e255e94e54116ebbe126515f5c911 upstream.
Workqueue xe-ggtt-wq has been allocated using WQ_MEM_RECLAIM, but
the flag has been passed as 3rd parameter (max_active) instead
of 2nd (flags) creating the workqueue as per-cpu with max_active = 8
(the WQ_MEM_RECLAIM value).
So change this by set WQ_MEM_RECLAIM as the 2nd parameter with a
default max_active.
Fixes: 60df57e496e4 ("drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM")
Cc: stable@vger.kernel.org
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260108180148.423062-1-marco.crivellari@suse.com
(cherry picked from commit aa39abc08e77d66ebb0c8c9ec4cc8d38ded34dc9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca9e5115e870b9a531deb02752055a8a587904e3 upstream.
Page accounting can change via the shrinker without calling
xe_ttm_tt_unpopulate(), which normally updates page count tracepoints
through update_global_total_pages. Add a call to
update_global_total_pages when the shrinker successfully shrinks a BO.
v2:
- Don't adjust global accounting when pinning (Stuart)
Cc: stable@vger.kernel.org
Fixes: ce3d39fae3d3 ("drm/xe/bo: add GPU memory trace points")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260107205732.2267541-1-matthew.brost@intel.com
(cherry picked from commit cc54eabdfbf0c5b6638edc50002cfafac1f1e18b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1a0f69e3c28477b97d3609569b7e8feb4b6162e8 upstream.
Fix several issues in dw_dp_bind() error handling:
1. Missing return after drm_bridge_attach() failure - the function
continued execution instead of returning an error.
2. Resource leak: drm_dp_aux_register() is not a devm function, so
drm_dp_aux_unregister() must be called on all error paths after
aux registration succeeds. This affects errors from:
- drm_bridge_attach()
- phy_init()
- devm_add_action_or_reset()
- platform_get_irq()
- devm_request_threaded_irq()
3. Bug fix: platform_get_irq() returns the IRQ number or a negative
error code, but the error path was returning ERR_PTR(ret) instead
of ERR_PTR(dp->irq).
Use a goto label for cleanup to ensure consistent error handling.
Fixes: 86eecc3a9c2e ("drm/bridge: synopsys: Add DW DPTX Controller support library")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20260102155553.13243-1-osama.abdelkader@gmail.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 10343253328e0dbdb465bff709a2619a08fe01ad upstream.
Remove emit_frame_cntl function for gfx v12, which is not support.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5aaa5058dec5bfdcb24c42fe17ad91565a3037ca)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 604826acb3f53c6648a7ee99a3914ead680ab7fb upstream.
Apparently we never actually filled these in, despite the fact that we do
in fact technically support atomic modesetting.
Since not having these filled in causes us to potentially forget to disable
fbdev and friends during suspend/resume, let's fix it.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260121191320.210342-1-lyude@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
unknown connectors
[ Upstream commit d0bd10792d6cc3725ddee43f03fd6ee234f24844 ]
* Implement missing DCB connectors in uconn.c previously defined in conn.h.
* Replace kernel WARN_ON macro with printk message to more gracefully signify
an unknown connector was encountered.
With this patch, unknown connectors are explicitly marked with value 0
(DCB_CONNECTOR_VGA) to match the tested current behavior. Although 0xff
(DCB_CONNECTOR_NONE) may be more suitable, I don't want to introduce a
breaking change.
Fixes: 8b7d92cad953 ("drm/nouveau/kms/nv50-: create connectors based on nvkm info")
Link: https://download.nvidia.com/open-gpu-doc/DCB/1/DCB-4.0-Specification.html#_connector_table_entry
Signed-off-by: Alex Ramírez <lxrmrz732@rocketmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[Lyude: Remove unneeded parenthesis around nvkm_warn()]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251213005327.9495-3-lxrmrz732@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3036b4ce4b209af690fa776e4616925892caba4c ]
* Add missing DCB connectors in conn.h as per the NVIDIA DCB specification.
A lot of connector logic was rewritten for Linux v6.5; some display connector types
went unaccounted-for which caused kernel warnings on devices with the now-unsupported
DCB connectors. This patch adds all of the DCB connectors as defined by NVIDIA to the
dcb_connector_type enum to bring back support for these connectors to the new logic.
Fixes: 8b7d92cad953 ("drm/nouveau/kms/nv50-: create connectors based on nvkm info")
Link: https://download.nvidia.com/open-gpu-doc/DCB/1/DCB-4.0-Specification.html#_connector_table_entry
Signed-off-by: Alex Ramírez <lxrmrz732@rocketmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[Lyude: Clarify DCB_CONNECTOR_HDMI_0 weirdness in comments]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251213005327.9495-2-lxrmrz732@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 095ca815174e51fc0049771712d5455cabd7231e ]
Needs to be a u64.
Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 56fff1941abd3ca3b6f394979614ca7972552f7f)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 764a90eb02268a23b1bb98be5f4a13671346804a ]
Radeon 430 and 520 are OEM GPUs from 2016~2017
They have the same device id: 0x6611 and revision: 0x87
On the Radeon 430, powertune is buggy and throttles the GPU,
never allowing it to reach its maximum SCLK. Work around this
bug by raising the TDP limits we program to the SMC from
24W (specified by the VBIOS on Radeon 430) to 32W.
Disabling powertune entirely is not a viable workaround,
because it causes the Radeon 520 to heat up above 100 C,
which I prefer to avoid.
Additionally, revise the maximum SCLK limit. Considering the
above issue, these GPUs never reached a high SCLK on Linux,
and the workarounds were added before the GPUs were released,
so the workaround likely didn't target these specifically.
Use 780 MHz (the maximum SCLK according to the VBIOS on the
Radeon 430). Note that the Radeon 520 VBIOS has a higher
maximum SCLK: 905 MHz, but in practice it doesn't seem to
perform better with the higher clock, only heats up more.
v2:
Move the workaround to si_populate_smc_tdp_limits.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 966d70f1e160bdfdecaf7ff2b3f22ad088516e9f)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d5077426e1a76d269e518e048bde2e9fc49b32ad ]
There is no reason to clear the SMC table.
We also don't need to recalculate the power limit then.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e214d626253f5b180db10dedab161b7caa41f5e9)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4ca284c6d15dda481f714e3687a1d5fb70b3bf5c ]
Use WREG32 to write mmCG_THERMAL_INT.
This is a direct access register.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2555f4e4a741d31e0496572a8ab4f55941b4e30e)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f262015b9797effdec15e8a81c81b2158ede9578 ]
Previously, the driver's internal wedged.mode state was updated without
verifying whether the corresponding engine reset policy update in GuC
succeeded. This could leave the driver reporting a wedged.mode state
that doesn't match the actual reset behavior programmed in GuC.
With this change, the reset policy is updated first, and the driver's
wedged.mode state is modified only if the policy update succeeds on all
available GTs.
This patch also introduces two functional improvements:
- The policy is sent to GuC only when a change is required. An update
is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG,
because only in that case the reset policy changes. For example,
switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and
XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no
need to send the same value to GuC.
- An inconsistent_reset flag is added to track cases where reset policy
update succeeds only on a subset of GTs. If such inconsistency is
detected, future wedged mode configuration will force a retry of the
reset policy update to restore a consistent state across all GTs.
Fixes: 6b8ef44cc0a9 ("drm/xe: Introduce the wedged_mode debugfs")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-3-lukasz.laguna@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 0f13dead4e0385859f5c9c3625a19df116b389d3)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 50a59230fa63989d59253622a8dd6386cca0db07 ]
Add a scope-based helpers for runtime PM that may be used to simplify
cleanup logic and potentially avoid goto-based cleanup.
For example, using
guard(xe_pm_runtime)(xe);
will get runtime PM and cause a corresponding put to occur automatically
when the current scope is exited. 'xe_pm_runtime_noresume' can be used
as a guard replacement for the corresponding 'noresume' variant.
There's also an xe_pm_runtime_ioctl conditional guard that can be used
as a replacement for xe_runtime_ioctl():
ACQUIRE(xe_pm_runtime_ioctl, pm)(xe);
if ((ret = ACQUIRE_ERR(xe_pm_runtime_ioctl, &pm)) < 0)
/* failed */
In a few rare cases (such as gt_reset_worker()) we need to ensure that
runtime PM is dropped when the function is exited by any means
(including error paths), but the function does not need to acquire
runtime PM because that has already been done earlier by a different
function. For these special cases, an 'xe_pm_runtime_release_only'
guard can be used to handle the release without doing an acquisition.
These guards will be used in future patches to eliminate some of our
goto-based cleanup.
v2:
- Specify success condition for xe_pm runtime_ioctl as _RET >= 0 so
that positive values will be properly identified as success and
trigger destructor cleanup properly.
v3:
- Add comments to the kerneldoc for the existing 'get' functions
indicating that scope-based handling should be preferred where
possible. (Gustavo)
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-31-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 59e7528dbfd52efbed05e0f11b2143217a12bc74)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Stable-dep-of: f262015b9797 ("drm/xe: Update wedged.mode only after successful reset policy change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 772157f626d0e1a7c6d49dffb0bbe4b2343a1d44 ]
We are meant to be checking the user vm for the bind queue, but actually
we are checking the migrate vm. For various reasons this is not
currently firing but this will likely change in the future.
Now that we have the user_vm attached to the bind queue, we can fix this
by directly checking that here.
Fixes: dba89840a920 ("drm/xe: Add GT TLB invalidation jobs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Arvind Yadav <arvind.yadav@intel.com>
Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com
(cherry picked from commit 9dd1048bca4fe2aa67c7a286bafb3947537adedb)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6f4b7aed61817624250e590ba0ef304146d34614 ]
Currently this is very broken if someone attempts to create a bind
queue and share it across multiple VMs. For example currently we assume
it is safe to acquire the user VM lock to protect some of the bind queue
state, but if allow sharing the bind queue with multiple VMs then this
quickly breaks down.
To fix this reject using a bind queue with any VM that is not the same
VM that was originally passed when creating the bind queue. This a uAPI
change, however this was more of an oversight on kernel side that we
didn't reject this, and expectation is that userspace shouldn't be using
bind queues in this way, so in theory this change should go unnoticed.
Based on a patch from Matt Brost.
v2 (Matt B):
- Hold the vm lock over queue create, to ensure it can't be closed as
we attach the user_vm to the queue.
- Make sure we actually check for NULL user_vm in destruction path.
v3:
- Fix error path handling.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Arvind Yadav <arvind.yadav@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com
(cherry picked from commit 9dd08fdecc0c98d6516c2d2d1fa189c1332f8dab)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Stable-dep-of: 772157f626d0 ("drm/xe/migrate: fix job lock assert")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|