kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
47 hours	drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages	Honglei Huang	1	-8/+1
	commit 342981fff32802a819d6fc7cf3c9fedf9f3d9d60 upstream. Since commit c08972f55594 ("drm/amdgpu: fix amdgpu_hmm_range_get_pages") moved mmu_interval_read_begin() out of the per-chunk loop, the captured notifier_seq is no longer refreshed across retries. As a result, the existing -EBUSY retry path can never make progress: hmm_range_fault() returns -EBUSY only when mmu_interval_check_retry(notifier, notifier_seq) reports that the sequence is stale. Once the sequence has advanced, the stored seq will never match again, so every subsequent call within the same invocation returns -EBUSY immediately. The "goto retry" therefore degenerates into a busy spin that simply burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before finally bailing out with -EAGAIN. This is pure latency with no chance of recovery, and it actively hurts the KFD userptr stack: the caller ends up blocked for a second while holding mmap_lock, only to return -EAGAIN to the restore worker (or to userspace) which would have re-driven the operation immediately anyway. Drop the retry/timeout entirely and let -EBUSY propagate straight to out_free_pfns, where it is already translated to -EAGAIN. Recovery is handled at a higher level: the KFD restore_userptr_worker reschedules itself, and the userptr ioctl path returns -EAGAIN to userspace. No functional regression: the previous behaviour on -EBUSY was already to fail with -EAGAIN after a 1s stall; we just skip the stall. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Honglei Huang <honghuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	RDMA/umem: Fix truncation for block sizes >= 4G	Jason Gunthorpe	1	-2/+2
	[ Upstream commit 15fe76e23615f502d051ef0768f86babaf08746c ] When the iommu is used the linearization of the mapping can give a single block that is very large split across multiple SG entries. When __rdma_block_iter_next() reassembles the split SG entries it is overflowing the 32 bit stack values and computed the wrong DMA addresses for blocks after the truncation. Use the right types to hold DMA addresses. Link: https://patch.msgid.link/r/1-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: a808273a495c ("RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	RDMA: Move DMA block iterator logic into dedicated files	Leon Romanovsky	18	-51/+59
	[ Upstream commit 6094ea64c69520ed1e770e7c79c43412de202bfa ] The DMA iterator logic was mixed into verbs and umem-specific code, forcing all users to include rdma/ib_umem.h. Move the block iterator logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and rdma/ib_verbs.h can be separated in a follow-up patch. Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Stable-dep-of: 15fe76e23615 ("RDMA/umem: Fix truncation for block sizes >= 4G") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	RDMA: During rereg_mr ensure that REREG_ACCESS is compatible	Jason Gunthorpe	6	-0/+37
	commit badad6fad60def1b9805559dd81dbab3d97b82aa upstream. If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be re-evaluated to ensure it is properly pinned as RW. Since the umem is hidden inside each driver's mr struct add a ib_umem_check_rereg() function that each driver has to call before processing IB_MR_REREG_ACCESS. mlx4 has to retain its duplicate ib_access_writable check because it implements IB_MR_REREG_ACCESS \| IB_MR_REREG_TRANS by changing both items in place sequentially while the MR is live, so it will continue to not support this combination. Cc: stable@vger.kernel.org Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage") Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com Reported-by: Philip Tsukerman <philiptsukerman@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	driver core: reject devices with unregistered buses	Johan Hovold	1	-2/+9
	commit 36f35b8df6972167102a1c3d4361e0afb6a84534 upstream. Trying to register a device on a bus which has not yet been registered used to trigger a NULL-pointer dereference, but since the const bus structure rework registration instead succeeds without the device being added to the bus. This specifically means that the device will never bind to a driver and that the bus sysfs attributes are not created (i.e. as if the device had no bus). Reject devices with unregistered buses to catch any callers that get the ordering wrong and to handle bus registration failures more gracefully. Fixes: 5221b82d46f2 ("driver core: bus: bus_add/probe/remove_device() cleanups") Cc: stable@vger.kernel.org # 6.3 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260430091718.230228-1-johan@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	driver core: faux: fix root device registration	Johan Hovold	1	-12/+10
	commit 580a795105dae2ef1622df72a27a8fb0605e2f6b upstream. A recent change made the faux bus root device be allocated dynamically but failed to provide a release function to free the memory when the last reference is dropped (on theoretical failure to register the device or bus). Fix this by using root_device_register() instead of open coding. Also add the missing sanity check when registering faux devices to avoid use-after-free if the bus failed to register (which would previously have triggered a bunch of use-after-free warnings). Fixes: 61b76d07d2b4 ("driver core: faux: stop using static struct device") Cc: stable@vger.kernel.org # 7.0 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260424153127.2647405-2-johan@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: Use krealloc_array() in dal_vector_reserve()	Harry Wentland	1	-2/+2
	commit da48bc4461b8a5ebfb9264c9b191a701d8e99009 upstream. [Why & How] dal_vector_reserve() computes the allocation size as "capacity * vector->struct_size" using uint32_t arithmetic, which can silently wrap to a small value on overflow. This would cause krealloc to return a smaller buffer than expected, leading to heap overflows on subsequent vector appends. Replace krealloc() with krealloc_array() which performs an internal overflow check and returns NULL on wrap, preventing the issue. Fixes: 2004f45ef83f ("drm/amd/display: Use kernel alloc/free") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 37668568641ccc4cc1dbca4923d0a16609dd5707) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: Fix out-of-bounds read in dp_get_eq_aux_rd_interval()	Harry Wentland	1	-1/+1
	commit e8b4d37eba05141ee01794fc6b7f2da808cee83b upstream. [Why & How] The aux_rd_interval array in struct dc_lttpr_caps is declared with MAX_REPEATER_CNT - 1 (7) elements, indexed 0..6. However, the offset parameter passed to dp_get_eq_aux_rd_interval() can be as large as MAX_REPEATER_CNT (8) when a sink reports 8 LTTPR repeaters via DPCD. This leads to an out-of-bounds read of aux_rd_interval[7] when offset is 8. Fix this by growing aux_rd_interval to MAX_REPEATER_CNT elements to accommodate the full range of valid repeater counts defined by the DP spec. Assisted-by: GitHub Copilot:Claude claude-4-opus Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a55a458a8df37a65ffda5cf721d554a8f74f6b04) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: Fix NULL deref and buffer over-read in SDP debugfs	Harry Wentland	1	-0/+5
	commit adf67034b1f61f7119295208085bfd43f85f56af upstream. [Why & How] dp_sdp_message_debugfs_write() dereferences connector->base.state->crtc without checking for NULL. A connector can be connected but not bound to any CRTC (e.g. after hot-plug before the next atomic commit), causing a kernel crash when writing to the sdp_message debugfs node. The function also ignores the user-provided size argument and always passes 36 bytes to copy_from_user(), reading past the user buffer when size < 36. Fix both issues by: - Returning -ENODEV when connector->base.state or state->crtc is NULL - Clamping write_size to min(size, sizeof(data)) Fixes: c7ba3653e977 ("drm/amd/display: Generic SDP message access in amdgpu") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6ab4c36a522842ff70474a1c0af2e40e50fc8300) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: add missing CSC entries for BT.2020 for DCE IPs	Leorize	2	-2/+18
	commit 6590fe323ce2807f5d9454e7fccf3fab875d4352 upstream. DCE-based hardware does not have the CSC matrices for BT.2020, which causes the driver to fallback to the GPU built-in matrices. This does not appear to cause any issues for RGB sinks, but causes major color artifacts for YCbCr ones (e.g. black becomes green). This commit adds the missing CSC matrices (taken from DC common) to DCE CSC tables, resolving the issue. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3358 Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5333 Assisted-by: oh-my-pi:GPT-5.5 Signed-off-by: Leorize <leorize+oss@disroot.org> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51e6668ab4baf55b082c376318d51ef965757196) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: Clamp VBIOS HDMI retimer register count to array size	Harry Wentland	1	-16/+32
	commit fb0707ce00eef4e2d60c3020e1c0432739703e4a upstream. [Why & How] The VBIOS integrated info tables (v1_11 and v2_1) contain HdmiRegNum and Hdmi6GRegNum fields that are used as loop bounds when copying retimer I2C register settings into fixed-size arrays (dp_ext_hdmi_reg_settings[9] and dp_ext_hdmi_6g_reg_settings[3]). These u8 fields are not validated before use, so a malformed VBIOS can specify values up to 255, causing an out-of-bounds heap write during driver probe. Clamp each register count to the destination array size using min_t() before the copy loops, in both get_integrated_info_v11() and get_integrated_info_v2_1(). Assisted-by: GitHub Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5a7f0ef90195940c54b0f5bb85b87da55f038c69) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: Clamp HDMI HDCP2 rx_id_list read to buffer size	Harry Wentland	1	-1/+2
	commit f0f3981c43b32cadfe373d636d9e9ca522bb3702 upstream. [Why & How] During HDCP 2.x repeater authentication over HDMI, the driver reads the sink's RxStatus register and extracts a 10-bit message size field (max value 1023). This value is used as the read length for the ReceiverID list without being clamped to the size of the destination buffer rx_id_list[177]. A malicious HDMI repeater could advertise a message size larger than the buffer, causing an out-of-bounds write during the I2C read. Clamp the read length in mod_hdcp_read_rx_id_list() to the size of the rx_id_list buffer, matching the approach already used in the DP branch. Fixes: eff682f83c9c ("drm/amd/display: Add DDC handles for HDCP2.2") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 229212219e4247d9486f8ba41ef087358490be09) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/display: Bound VBIOS record-chain walk loops	Harry Wentland	3	-14/+33
	commit ff287df16a1a58aca78b08d1f3ee09fc44da0351 upstream. [Why & How] All record-chain walk loops in bios_parser.c and bios_parser2.c use for(;;) and only terminate on a 0xFF record_type sentinel or zero record_size. A malformed VBIOS image missing the terminator record causes unbounded iteration at probe time, potentially hundreds of thousands of iterations with record_size=1. In the final iterations near the BIOS image boundary, struct casts beyond the 2-byte header validated by GET_IMAGE can also read out of bounds. Cap all 14 record-chain walk loops to BIOS_MAX_NUM_RECORD (256) iterations. The atombios.h defines up to 22 distinct record types and atomfirmware.h has 13. Assuming an average of less than 10 records per type (which is reasonable since most are connector- based) 256 is a generous upper bound. Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Assisted-by: Copilot:claude-opus-4.6 Mythos Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 95700a3d660287ed657d6892f7be9ffc0e294a93) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_range	Priya Hosur	1	-1/+2
	commit 03b70e0d8aa26bab89a0f1394c1c80a871925e42 upstream. In smu_v14_0_0_set_soft_freq_limited_range(), the gfxclk floor is programmed via SetHardMinGfxClk together with SetSoftMaxGfxClk. Under power_dpm_force_performance_level=high this pins HardMin to peak gfxclk. In PMFW arbitration HardMin has higher priority than SoftMax, so the firmware thermal/PPT throttler cannot clamp gfxclk via SoftMax once HardMin is set to peak. Replace SetHardMinGfxClk with SetSoftMinGfxclk so the driver still requests peak performance but the firmware throttler retains the ability to clamp gfxclk under thermal/PPT pressure. SoftMax handling is unchanged and no other clock domains are affected. Signed-off-by: Priya Hosur <Priya.Hosur@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ea273267fd29cbf6d83ee72329f59eb5042605b) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/pm: mark metrics.energy_accumulator is invalid for smu 14.0.2	Yang Wang	1	-1/+0
	commit ee193c5bbd5e2b56bbeb54ef554414b43a6fc896 upstream. EnergyAccumulator is unsupported on SMU 14.0.2, mark it invalid. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 646b05043eeed04b51c14aad22a400a8250af4b7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/pm: fix smu13 power limit default/cap calculation	Yang Wang	2	-29/+35
	commit bb204f19e4a115f094a6a3c4d82fcf48862d0766 upstream. smu_v13_0_0_get_power_limit() and smu_v13_0_7_get_power_limit() mix runtime power_limit with PP table limits when reporting default/min/max. When current power limit query succeeds, default_power_limit was set to the runtime value instead of the PP table default, and min/max could be derived from inconsistent bases (MsgLimits/runtime), leading to incorrect cap info. Use SocketPowerLimitAc/Dc as the PP default base (pp_limit), keep current_power_limit as runtime value, and derive min/max from pp_limit with OD percentages. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5227 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1eaf26db95901ca70737503a89b831dd763c8453) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amd/pm: apply SMU 13.0.10 workaround during MP1 unload	Yang Wang	1	-1/+9
	commit 2493d87bb4c31ec9ca7f0ef7257e33b8b175f913 upstream. On SMU v13.0.10, sending PrepareMp1ForUnload with the default parameter may leave the device in an inaccessible state. This can affect runtime power management and partial PnP flows. e.g: kexec, driver unload, boco/d3cold. Pass the required workaround parameter 0x55, when preparing MP1 for unload on SMU v13.0.10, keep the existing behavior for other SMU versions. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5133 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e8ee1afeedb8d24dd22cdd5ae9f98a6d76ebe4b) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amdgpu: Fix incorrect VRAM GART mappings on non-4K page size systems	Donet Tom	1	-4/+8
	commit ec4c462e2d8161b32038e21e7187f4a15fe1661d upstream. When mapping VRAM pages into the GART page table, amdgpu_gart_map_vram_range() assumes that the system page size is the same as the GPU page size. On systems with non-4K page sizes, multiple GPU pages can exist within a single CPU page. As a result, the mappings are created incorrectly because fewer page table entries are programmed than required. Fix this by programming the mappings correctly for non-4K page size systems. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8f0bc22388f74e0cf4ed8b7d1846c580eaf44cc) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)	Vitaly Prosyak	1	-1/+1
	commit e47b0056a08dc70430ffc44bbf62197e7d1ff8ea upstream. Problem: While developing the amd_close_race IGT test (which intentionally triggers execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce zero diagnostic output. The GPU simply hangs silently for ~10s until the scheduler timeout fires. There is no way to distinguish an execute permission fault from any other type of GPU hang. Root cause: GFX 10.1.x defaults to noretry=0, which sets RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers (gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE, wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the translation indefinitely, expecting software to eventually fix the permission bits (as happens in SVM/HMM recovery). No interrupt of any kind reaches the IH ring. This is different from invalid-page faults (V=0) which DO generate a retry fault interrupt that the driver can escalate to a no-retry fault. Permission faults with valid PTEs loop silently forever in hardware. GFX 10.3+ already defaults to noretry=1, which makes permission faults generate immediate L2 protection fault interrupts. GFX 10.1.x was inadvertently left out of this default. Fix: Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer generations. With noretry=1, the existing non-retry fault handler (gmc_v10_0_process_interrupt) already decodes and prints the full GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS, faulting address, VMID, PASID, and process name. No additional logging code is needed — the fix is purely routing permission faults to the existing, fully-capable non-retry interrupt handler. v2: Dropped GFX10-specific logging from gmc_v10_0.c and kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry fault handler, but with noretry=1 permission faults take the non-retry path — the v1 retry handler code was dead and would never execute. Tested on Navi10 (GFX 10.1.10): - Execute permission faults now produce immediate, clear output: [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592) Process amd_close_race pid 13380 thread amd_close_race pid 13384 in page at address 0x40001000 from client 0x1b (UTCL2) GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881 PERMISSION_FAULTS: 0x8 - No regressions with properly-mapped GPU workloads Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb21edd24c40d81066753f8ac6f23bce15745395) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amdgpu: restart the CS if some parts of the VM are still invalidated	Christian König	1	-1/+3
	commit 40396ffdf6120e2380706c59e1a84d7e765a37b6 upstream. Make sure that we only submit work with full up to date VM page tables. Backport to 7.1 and older. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amdgpu: fix waiting for all submissions for userptrs	Christian König	1	-2/+4
	commit 58bafc666c484b21839a2d27e923ae1b2727a1df upstream. Wait for all submissions when userptrs need to be invalidated by the MMU notifier, not just the one the userptr was involved into. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91250893cbaa25c86872deca95a540d08de1f91e) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/v3d: Skip CSD when it has zeroed workgroups	Maíra Canal	1	-3/+13
	commit 7f93fad5ea0affc9e1505dd0f7596c0fdb496213 upstream. A compute shader dispatch encodes its workgroup counts in the CFG0..CFG2 registers. Kicking off a dispatch with a zero count in any of the three dimensions is invalid. First, the hardware will process 0 as 65536, while the user-space driver exposes a maximum of 65535. Over that, a submission with a zeroed workgroup dimension should be a no-op. These zeroed counts can reach the dispatch path through an indirect CSD job, whose workgroup counts are only known once the indirect buffer is read and may legitimately be zero, but such scenario should only result in a no-op. Overwrite the indirect CSD job workgroup counts with the indirect BO ones, even if they are zeroed, and don't submit the job to the hardware when any of the workgroup counts is zero, so the job completes immediately instead of running the shader. Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Suggested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260602-v3d-fix-indirect-csd-v4-2-654309e32bc0@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/v3d: Fix vaddr leak when indirect CSD has zeroed workgroups	Maíra Canal	1	-1/+2
	commit ae7676952790f421c40918e2586a2c9f12a682b6 upstream. v3d_rewrite_csd_job_wg_counts_from_indirect() maps both the indirect buffer and the workgroup buffer and is expected to release them before returning. When any of the workgroup counts read from the buffer is zero, the function bailed out early and skipped the cleanup, leaking the vaddr mappings of both BOs. Jump to the cleanup path instead of returning directly, so the mappings are always dropped. Cc: stable@vger.kernel.org Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job") Suggested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260602-v3d-fix-indirect-csd-v4-1-654309e32bc0@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/v3d: Fix global performance monitor reference counting	Maíra Canal	1	-5/+19
	commit 6bf7e2affc6e62da7add393d7f352d4040f5bc27 upstream. In the SET_GLOBAL ioctl, v3d_perfmon_find() bumps the reference count on the perfmon it returns, but v3d_perfmon_set_global_ioctl() and v3d_perfmon_delete() fail to release that reference on several paths: 1. v3d_perfmon_set_global_ioctl() leaks the reference on its error paths. 2. CLEAR_GLOBAL leaks both the find reference and the reference previously stashed in v3d->global_perfmon by the SET_GLOBAL ioctl that configured it. 3. Destroying a perfmon that is the current global perfmon leaks the reference stashed by the SET_GLOBAL ioctl. Release each of these references explicitly. Cc: stable@vger.kernel.org Fixes: c6eabbab359c ("drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260531-v3d-perfmon-lifetime-v2-1-60ed4485a203@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/v3d: Wait for pending L2T flush before cleaning caches	Maíra Canal	1	-0/+8
	commit abf888b03a9805a3bc37948a0df443553b1c0910 upstream. v3d_clean_caches() starts the cache-clean sequence by writing V3D_L2TCACTL_TMUWCF to V3D_CTL_L2TCACTL and then polling for that bit to clear. It does not, however, check for an L2T flush (L2TFLS) that may still be in flight from a previous operation. On pre-V3D 7.1 hardware, kicking off the TMU write-combiner flush while an L2T flush is still pending can clobber bits in L2TCACTL and cause cache inconsistencies. Poll for L2TFLS to clear before writing L2TCACTL on V3D < 7.1, ensuring any pending flush has completed before a new clean is issued. Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Link: https://patch.msgid.link/20260530-v3d-fix-rpi4-freezes-v1-1-c2c8307da6ce@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/xe: Clear pending_disable before signaling suspend fence	Tangudu Tilak Tirumalesh	1	-1/+1
	commit 54f2a0442a30fe7a0f6bc8345e81f8b2db8effbd upstream. In the schedule-disable done path for suspend, we signal the suspend fence before clearing pending_disable. That wakeup can let suspend_wait complete and resume be queued immediately. The resume path may then reach enable_scheduling() while pending_disable is still set and hit the !exec_queue_pending_disable(q) assertion. Fix this by clearing pending_disable before signaling the suspend fence, so any resumed transition observes a consistent state. Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/xe/multi_queue: skip submit when primary queue is suspended	Niranjana Vishwanathapura	1	-1/+4
	commit ec4cbdd163f9bb2a2bd44eb93ecf4a2fa0e912a9 upstream. Return early in submit path when the multi-queue primary exec queue is suspended to avoid submitting while suspended. v2: Remove idle_skip_suspend fix as that feature is being reverted here https://patchwork.freedesktop.org/series/167262/ Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support") Cc: stable@vger.kernel.org # v7.0+ Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com (cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/xe/display: fix oops in suspend/shutdown without display	Jani Nikula	1	-2/+9
	commit 68938cc08e23a94fd881e845837ff918de005ce7 upstream. The xe driver keeps track of whether to probe display, and whether display hardware is there, using xe->info.probe_display. It gets set to false if there's no display after intel_display_device_probe(). However, the display may also be disabled via fuses, detected at a later time in intel_display_device_info_runtime_init(). In this case, the xe driver does for_each_intel_crtc() on uninitialized mode config in xe_display_flush_cleanup_work(), leading to a NULL pointer dereference, and generally calls display code with display info cleared. Check for intel_display_device_present() after intel_display_device_info_runtime_init(), and reset xe->info.probe_display as necessary. Also do unset_display_features() for completeness, although display runtime init has already done that. This will need to be unified across all cases later. Move intel_display_device_info_runtime_init() call slightly earlier, similar to i915, to avoid a bunch of unnecessary setup for no display cases. Note #1: The xe driver has no business doing low level display plumbing like for_each_intel_crtc() to begin with. It all needs to happen in display code. Note #2: The actual bug is present already in commit 44e694958b95 ("drm/xe/display: Implement display support"), but the oops was likely introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display suspend/resume work on discrete"). Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150 Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11	Andrew Martin	1	-8/+41
	commit 352ea59028ea48a6fff77f19ae28f98f71946a80 upstream. The v11 MQD manager incorrectly assigned the CP-compute variants of checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. During CRIU checkpoint of an SDMA queue on Navi3x: - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, leaking 1536 bytes of adjacent GTT memory to userspace During CRIU restore: - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, corrupting 1536 bytes of adjacent GTT memory (often the ring buffer or neighboring MQDs) This is a copy-paste regression unique to v11. All other ASIC backends (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly handle the smaller v11_sdma_mqd structure, matching the pattern used in other MQD managers. Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin <andrew.martin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa41db7ffdec97d62433adf03b7b9b759af8c2c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/amdkfd: fix NULL dereference in get_queue_ids()	Muhammad Bilal	1	-1/+1
	commit 2bd550b547deabef98bd3b017ff743b7c34d3a6d upstream. When usr_queue_id_array is NULL and num_queues is non-zero, get_queue_ids() returns NULL. The callers check only IS_ERR() on the return value; since IS_ERR(NULL) == false the check passes, and suspend_queues() calls q_array_invalidate() which immediately dereferences NULL while iterating num_queues times. Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying num_queues > 0 with a zero queue_array_ptr, causing a kernel panic. A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op (q_array_invalidate never executes, and resume_queues already guards all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL) only when num_queues is non-zero and the pointer is absent; both callers already propagate IS_ERR() returns correctly to userspace. Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f165a82cdf503884bb1797771c61b2fcc72113d4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/i915: Fix color blob reference handling in intel_plane_state	Chaitanya Kumar Borah	1	-0/+27
	commit 26eb7c0a7ab09d83eec833db6a5a2bc60b9d4d9a upstream. Take proper references for hw color blobs (degamma_lut, gamma_lut, ctm, lut_3d) in intel_plane_duplicate_state() and drop them in intel_plane_destroy_state(). v2: - handle blobs in hw state clear Cc: <stable@vger.kernel.org> #v6.19+ Fixes: 3b7476e786c2 ("drm/i915/color: Add framework to program PRE/POST CSC LUT") Fixes: a78f1b6baf4d ("drm/i915/color: Add framework to program CSC") Fixes: 65db7a1f9cf7 ("drm/i915/color: Add 3D LUT to color pipeline") Reviewed-by: Pranay Samala <pranay.samala@intel.com> #v1 Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Signed-off-by: Uma Shankar <uma.shankar@intel.com> Link: https://patch.msgid.link/20260601082953.128539-4-chaitanya.kumar.borah@intel.com (cherry picked from commit c6eea1925154b6697fe22b217faab9bb30635e6b) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	drm/gem: Try to fix change_handle ioctl, attempt 4	Simona Vetter	2	-41/+37
	commit 1a4f03d22fb655e5f192244fb2c87d8066fcfca2 upstream. [airlied: just added some comments on how to reenable] On-list because the cat is out of the bag and we're clearly not good enough to figure this out in private. The story thus far: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") tried to fix a race condition between the gem_close and gem_change_handle ioctls, but got a few things wrong: - There's a confusion with the local variable handle, which is actually the new handle, and so the two-stage trick was actually applied to the wrong idr slot. 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") tried to fix that by adding yet another code block, but forgot to add the error handling. Which meant we now have two paths, both kinda wrong. - dc366607c41c ("drm: Replace old pointer to new idr") tried to apply another fix, but inconsistently, again because of the handle confusion - this would be the right fix (kinda, somewhat, it's a mess) if we'd do the two-stage approach for the new handle. Except that wasn't the intent of the original fix. We also didn't have an igt merged for the original ioctl, which is a big no-go. This was attempted to address off-list in the original bugfix, and amd QA people claimed the bug was fixed now. Very clearly that's not the case. Here's my attempt to sort this out: - Rename the local variable to new_handle, the old aliasing with args->handle is just too dangerously confusing. - Merge the gem obj lookup with the two-stage idr_replace so that we avoid getting ourselves confused there. - This means we don't have a surplus temporary reference anymore, only an inherited from the idr. A concurrent gem_close on the new_handle could steal that. Fix that with the same two-stage approach create_tail uses. This is a bit overkill as documented in the comment, but I also don't trust my ability to understand this all correctly, so go with the established pattern we have from other ioctls instead for maximum paranoia. - Adjust error paths. I've tried to make the error and success paths common, because they are identical except for which handle is removed and on which we call idr_replace to (re)install the object again. But that made things messier to read, so I've left it at the more verbose version, which unfortunately hides the symmetry in the entire code flow a bit. - While at it, also replace the 7 space indent with 1 tab. And finally, because I flat out don't trust my abilities here at all anymore: - Disable the ioctl until we have the igt situation and everything else sorted out on-list and with full consensus. v2: Sashiko noticed that I didn't handle the error path for idr_replace correctly, it must be checked with IS_ERR_OR_NULL like in gem_handle_delete. So yeah, definitely should just the existing paths 1:1 because this is endless amounts of tricky. Also add the Fixes: line for the original ioctl, I forgot that too. Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Fixes: dc366607c41c ("drm: Replace old pointer to new idr") Cc: syzbot+d7c9eed171647e421013@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Edward Adam Davis <eadavis@qq.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Cc: David Francis <David.Francis@amd.com> Cc: Puttimet Thammasaeng <pwn8official@gmail.com> Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") Cc: Zhenghang Xiao <kipreyyy@gmail.com> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260604194437.1725314-1-simona.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock	Bjorn Andersson	1	-3/+0
	commit 55f2ea9ff83cc27a85526b14bc9b32f96a08d6ec upstream. During the SSR/PDR down notification the tx_lock is taken with the intent to provide synchronization with active DMA transfers. But during this period qcom_slim_ngd_down() is invoked, which ends up in slim_report_absent(), which takes the slim_controller lock. In multiple other codepaths these two locks are taken in the opposite order (i.e. slim_controller then tx_lock). The result is a lockdep splat, and a possible deadlock: rprocctl/449 is trying to acquire lock: ffff00009793e620 (&ctrl->lock){+.+.}-{4:4}, at: slim_report_absent (drivers/slimbus/core.c:322) slimbus but task is already holding lock: ffff00009793fb50 (&ctrl->tx_lock){+.+.}-{4:4}, at: qcom_slim_ngd_ssr_pdr_notify (drivers/slimbus/qcom-ngd-ctrl.c:1475) slim_qcom_ngd_ctrl which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ctrl->tx_lock); lock(&ctrl->lock); lock(&ctrl->tx_lock); lock(&ctrl->lock); The assumption is that the comment refers to the desire to not call qcom_slim_ngd_exit_dma() while we have an ongoing DMA TX transaction. But any such transaction is initiated and completed within a single qcom_slim_ngd_xfer_msg(). Prior to calling qcom_slim_ngd_exit_dma() the slim_controller is torn down, all child devices are notified that the slimbus is gone and the child devices are removed. Stop taking the tx_lock in qcom_slim_ngd_ssr_pdr_notify() to avoid the deadlock. Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-9-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD	Bjorn Andersson	1	-1/+5
	commit 6a003446b725c44b9e3ffa111b0effbaa2d43085 upstream. The pm_runtime_enable() and pm_runtime_use_autosuspend() calls are supposed to be balanced on exit, add these calls. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-8-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership	Bjorn Andersson	1	-2/+3
	commit 960b53a3f76fa214c2fc493734ae7b3c5e713bbf upstream. PDR and SSR callbacks are registred from the controller probe function, but currently released from the child device's remove function. The remove() function should only be unwinding what was done in the same device's probe() function. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-5-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Initialize controller resources in controller	Bjorn Andersson	1	-22/+16
	commit 07c564ea5fb859b7381429de935d5df4781947c6 upstream. The work structs and work queue are controller resources, create and destroy them in the controller context. Creating them as part of the child device's probe path seems to be okay now that the controller's probe has been updated, but if for some reason the child does not probe successfully a SSR or PDR notification will schedule_work() on an uninitialized "ngd_up_work". Move the initialization of these controller resources to the controller probe function to avoid any issues, and to clarify the ownership. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-7-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd	Bjorn Andersson	1	-19/+26
	commit 2a9d50e9ea406e0c8735938484adc20515ef1b47 upstream. When the remoteproc starts in parallel with the NGD driver being probed, or the remoteproc is already up when the PDR lookup is being registered, or in the theoretical event that we get an interrupt from the hardware, these callbacks will operate on uninitialized data. This result in issues to boot the affected boards. One such example can be seen in the following fault, where qcom_slim_ngd_ssr_pdr_notify() schedules work on the NULL ngd_up_work. [ 21.858578] ------------[ cut here ]------------ [ 21.858745] WARNING: kernel/workqueue.c:2338 at __queue_work+0x5e0/0x790, CPU#2: kworker/2:2/116 ... [ 21.859251] Call trace: [ 21.859255] __queue_work+0x5e0/0x790 (P) [ 21.859265] queue_work_on+0x6c/0xf0 [ 21.859273] qcom_slim_ngd_ssr_pdr_notify+0x110/0x150 [slim_qcom_ngd_ctrl] [ 21.859304] qcom_slim_ngd_ssr_notify+0x24/0x40 [slim_qcom_ngd_ctrl] [ 21.859318] notifier_call_chain+0xa4/0x230 [ 21.859329] srcu_notifier_call_chain+0x64/0xb8 [ 21.859338] ssr_notify_start+0x40/0x78 [qcom_common] [ 21.859355] rproc_start+0x130/0x230 [ 21.859367] rproc_boot+0x3d4/0x518 ... Move the enablement of interrupts, and the registration of SSR and PDR until after the NGD device has been registered. This could be further refined by moving initialization to the control driver probe and by removing the platform driver model from the picture. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-6-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Fix probe error path ordering	Bjorn Andersson	1	-6/+5
	commit 2c22ff152d380ec3d3af099fa05d0ac5ca9b4c1e upstream. qcom_slim_ngd_ctrl_probe() first registers the SSR callback then allocates the PDR context, as such the error path needs to come in opposite order to allow us to unroll each step. Fixes: 16f14551d0df ("slimbus: qcom-ngd: cleanup in probe error path") Cc: stable@vger.kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-4-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: Fix up platform_driver registration	Bjorn Andersson	1	-3/+33
	commit 8663e8334d7b6007f5d8a4e5dd270246f35107a6 upstream. Device drivers should not invoke platform_driver_register()/unregister() in their probe and remove paths. They should further not rely on platform_driver_unregister() as their only means of "deleting" their child devices. Introduce a helper to unregister the child device and move the platform_driver_register()/unregister() to module_init()/exit(). Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-3-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	slimbus: qcom-ngd-ctrl: fix OF node refcount	Bartosz Golaszewski	1	-1/+1
	commit 120134fe75c6b0ae38f14eb8b548ad1e5761f912 upstream. Platform devices created with platform_device_alloc() call platform_device_release() when the last reference to the device's kobject is dropped. This function calls of_node_put() unconditionally. This works fine for devices created with platform_device_register_full() but users of the split approach (platform_device_alloc() + platform_device_add()) must bump the reference of the of_node they assign manually. Add the missing call to of_node_get(). Cc: stable@vger.kernel.org Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-2-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	thunderbolt: Limit XDomain response copy to actual frame size	Michael Bommarito	1	-1/+3
	commit 4db2bd2ed4785dbadaeeab9f4e346b21ac5fb8eb upstream. tb_xdomain_copy() copies req->response_size bytes from the received packet buffer regardless of the actual frame size. When a short response arrives, this reads past the valid frame data in the DMA pool buffer into stale contents from previous transactions. Use the minimum of frame size and expected response size for the copy length. Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	thunderbolt: Validate XDomain request packet size before type cast	Michael Bommarito	1	-2/+6
	commit a504b9f2797b739e0304d537e8aa4ce883ecce39 upstream. tb_xdp_handle_request() casts the received packet buffer to protocol-specific structs without verifying that the allocation is large enough for the target type. A peer can send a minimal XDomain packet that passes the generic header length check but is shorter than the struct accessed after the cast, causing out-of- bounds reads from the kmemdup allocation. Plumb the packet length through xdomain_request_work and validate it against the expected struct size before each cast. Fixes: 8e1de7042596 ("thunderbolt: Add support for XDomain lane bonding") Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	thunderbolt: Clamp XDomain response data copy to allocation size	Michael Bommarito	1	-0/+2
	commit 322e93448d908434ae5545660fcbe8f5a7a8e141 upstream. tb_xdp_properties_request() derives the per-packet copy length from the response header without checking that it fits in the previously allocated data buffer. A malicious peer can set its length field larger than the declared data_length, causing memcpy to write past the kcalloc allocation. Clamp the per-packet copy length so that the cumulative offset never exceeds data_len. Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	thunderbolt: Bound root directory content to block size	Michael Bommarito	1	-0/+4
	commit 65423079c7420e3dbf9a7aa345c243a3f5752e5d upstream. __tb_property_parse_dir() does not check that content_offset + content_len fits within block_len for the root directory case. When rootdir->length equals or exceeds block_len - 2, the entry loop reads past the allocated property block. Add a bounds check after computing content_offset and content_len to reject directories whose content extends past the block. Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	thunderbolt: Reject zero-length property entries in validator	Michael Bommarito	1	-0/+2
	commit cff8eb65d1eafe7793e54b4d0cf6bf831644630b upstream. tb_property_entry_valid() accepts entries with length == 0 for DIRECTORY, DATA, and TEXT types. A zero-length TEXT entry passes validation but causes an underflow in the null-termination logic: property->value.text[property->length * 4 - 1] = '\0'; When property->length is 0 this writes to offset -1 relative to the allocation. Reject zero-length entries early in the validator since they have no valid representation in the XDomain property protocol. Fixes: cdae7c07e3e3 ("thunderbolt: Add support for XDomain properties") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	rtase: Reset TX subqueue when clearing TX ring	Justin Lai	1	-0/+2
	commit ab1ecaabe74b7d86c38ab2ab44bd56cdcc33645a upstream. rtase_tx_clear() clears the TX ring and resets the ring indexes. However, the TX queue state and BQL accounting are not reset at the same time. This may leave __QUEUE_STATE_STACK_XOFF asserted after rtase_sw_reset(), preventing new TX packets from being scheduled. Reset the TX subqueue when clearing the TX ring so the TX queue state and BQL accounting are restored together. Fixes: 5a2a2f15244c ("rtase: Implement the rtase_down function") Cc: stable@vger.kernel.org Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260602114659.12335-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	rtase: Avoid sleeping in get_stats64()	Justin Lai	1	-2/+3
	commit 9fc237f8d49f06d05f0f8e80361047b718894e81 upstream. The .ndo_get_stats64 callback must not sleep because it can be called when reading /proc/net/dev. rtase_get_stats64() calls rtase_dump_tally_counter(), which polls the tally counter dump bit with read_poll_timeout(). This may sleep while waiting for the hardware counter dump to complete. Use read_poll_timeout_atomic() instead to avoid sleeping in the get_stats64() path. Fixes: 079600489960 ("rtase: Implement net_device_ops") Cc: stable@vger.kernel.org Signed-off-by: Justin Lai <justinlai0215@realtek.com> Link: https://patch.msgid.link/20260603061816.31356-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	pmdomain: ti_sci: add wakeup constraint to parent devices of wakeup source	Kendall Willis	1	-1/+1
	commit 4db207599acfc9d676340daa2dc6b52bfca17db4 upstream. Set wakeup constraint for any device in a wakeup path. All parent devices of a wakeup device should not be turned off during suspend. This ensures the wakeup device is kept on while the system is suspended. Cc: stable@vger.kernel.org Fixes: 9d8aa0dd3be4 ("pmdomain: ti_sci: add wakeup constraint management") Reported-by: Vitor Soares <vitor.soares@toradex.com> Closes: https://lore.kernel.org/linux-pm/c0fe43a2339c802e9ce5900092cd530a2ba17a6b.camel@gmail.com/ Signed-off-by: Kendall Willis <k-willis@ti.com> Reviewed-by: Sebin Francis <sebin.francis@ti.com> Signed-off-by: Ulf Hansson <ulfh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	pmdomain: imx: fix OF node refcount	Bartosz Golaszewski	1	-1/+1
	commit fba0510cd62666951dcc0221527edc0c47ae6599 upstream. for_each_child_of_node_scoped() decrements the reference count of the nod after each iteration. Assigning it without incrementing the refcount to a dynamically allocated platform device will result in a double put in platform_device_release(). Add the missing call to of_node_get(). Cc: stable@vger.kernel.org Fixes: 3e4d109ee8fc ("pmdomain: imx: gpc: Simplify with scoped for each OF child loop") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Ulf Hansson <ulfh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47 hours	mmc: sdhci: add signal voltage switch in sdhci_resume_host	Jisheng Zhang	1	-0/+1
	commit f595e8e77a51eee35e331f69321766593a845ef2 upstream. I met one suspend/resume issue with sdr104 capable sdio wifi card (with "keep-power-in-suspend" set in DT property): After resuming from suspend to ram, the sdio wifi card stops working. Further debug shows that although ios shows the sdio card is at sdr104 mode, the voltage is still at 3V3. This is due to missing the calling of ->start_signal_voltage_switch() in sdhci_resume_host(). Fix this issue by adding ->start_signal_voltage_switch() in sdhci_resume_host(). This also matches what we do for sdhci_runtime_resume_host(). Then the question is: why this issue hasn't reported and fixed for so long time. IMHO, several reasons: Some host controllers just kick off the runtime resume for system resume, so they benefit from the well supported runtime pm code; Some platforms just use the old sdio wifi card which doesn't need signal voltage switch at all, the default voltage is 3v3 after resuming. Fixes: 6308d2905bd3 ("mmc: sdhci: add quirk for keeping card power during suspend") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>