summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
3 daysata: libata-core: fix cancellation of a port deferred qc workDamien Le Moal1-5/+3
commit 55db009926634b20955bd8abbee921adbc8d2cb4 upstream. cancel_work_sync() is a sleeping function so it cannot be called with the spin lock of a port being held. Move the call to this function in ata_port_detach() after EH completes, with the port lock released, together with other work cancellation calls. Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysata: libata-eh: correctly handle deferred qc timeoutsDamien Le Moal1-3/+19
commit eddb98ad9364b4e778768785d46cfab04ce52100 upstream. A deferred qc may timeout while waiting for the device queue to drain to be submitted. In such case, since the qc is not active, ata_scsi_cmd_error_handler() ends up calling scsi_eh_finish_cmd(), which frees the qc. But as the port deferred_qc field still references this finished/freed qc, the deferred qc work may eventually attempt to call ata_qc_issue() against this invalid qc, leading to errors such as reported by UBSAN (syzbot run): UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24 shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int' ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120 ubsan_epilogue+0xa/0x30 lib/ubsan.c:233 __ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494 ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166 ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679 process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275 process_scheduled_works kernel/workqueue.c:3358 [inline] worker_thread+0x5da/0xe40 kernel/workqueue.c:3439 kthread+0x370/0x450 kernel/kthread.c:467 ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fix this by checking if the qc of a timed out SCSI command is a deferred one, and in such case, clear the port deferred_qc field and finish the SCSI command with DID_TIME_OUT. Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnet: stmmac: dwmac-loongson: Set clk_csr_i to 100-150MHzHuacai Chen1-2/+2
commit e1aa5ef892fb4fa9014a25e87b64b97347919d37 upstream. Current clk_csr_i setting of Loongson STMMAC (including LS7A1000/2000 and LS2K1000/2000/3000) are copy & paste from other drivers. In fact, Loongson STMMAC use 125MHz clocks and need 62 freq division to within 2.5MHz, meeting most PHY MDC requirement. So fix by setting clk_csr_i to 100-150MHz, otherwise some PHYs may link fail. Cc: stable@vger.kernel.org Fixes: 30bba69d7db40e7 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Hongliang Wang <wanghongliang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://patch.msgid.link/20260203062901.2158236-1-chenhuacai@loongson.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysDrivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RTJan Kiszka1-1/+65
commit f8e6343b7a89c7c649db5a9e309ba7aa20401813 upstream. Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V with related guest support enabled: [ 1.127941] hv_vmbus: registering driver hyperv_drm [ 1.132518] ============================= [ 1.132519] [ BUG: Invalid wait context ] [ 1.132521] 6.19.0-rc8+ #9 Not tainted [ 1.132524] ----------------------------- [ 1.132525] swapper/0/0 is trying to lock: [ 1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0 [ 1.132543] other info that might help us debug this: [ 1.132544] context-{2:2} [ 1.132545] 1 lock held by swapper/0/0: [ 1.132547] #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0 [ 1.132557] stack backtrace: [ 1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)} [ 1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025 [ 1.132567] Call Trace: [ 1.132570] <IRQ> [ 1.132573] dump_stack_lvl+0x6e/0xa0 [ 1.132581] __lock_acquire+0xee0/0x21b0 [ 1.132592] lock_acquire+0xd5/0x2d0 [ 1.132598] ? vmbus_chan_sched+0xc4/0x2b0 [ 1.132606] ? lock_acquire+0xd5/0x2d0 [ 1.132613] ? vmbus_chan_sched+0x31/0x2b0 [ 1.132619] rt_spin_lock+0x3f/0x1f0 [ 1.132623] ? vmbus_chan_sched+0xc4/0x2b0 [ 1.132629] ? vmbus_chan_sched+0x31/0x2b0 [ 1.132634] vmbus_chan_sched+0xc4/0x2b0 [ 1.132641] vmbus_isr+0x2c/0x150 [ 1.132648] __sysvec_hyperv_callback+0x5f/0xa0 [ 1.132654] sysvec_hyperv_callback+0x88/0xb0 [ 1.132658] </IRQ> [ 1.132659] <TASK> [ 1.132660] asm_sysvec_hyperv_callback+0x1a/0x20 As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT, the vmbus_isr execution needs to be moved into thread context. Open- coding this allows to skip the IPI that irq_work would additionally bring and which we do not need, being an IRQ, never an NMI. This affects both x86 and arm64, therefore hook into the common driver logic. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com> Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/exynos: vidi: fix to avoid directly dereferencing user pointerJeongjun Park1-4/+18
commit d4c98c077c7fb2dfdece7d605e694b5ea2665085 upstream. In vidi_connection_ioctl(), vidi->edid(user pointer) is directly dereferenced in the kernel. This allows arbitrary kernel memory access from the user space, so instead of directly accessing the user pointer in the kernel, we should modify it to copy edid to kernel memory using copy_from_user() and use it. Cc: <stable@vger.kernel.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl()Jeongjun Park2-1/+14
commit d3968a0d85b211e197f2f4f06268a7031079e0d0 upstream. vidi_connection_ioctl() retrieves the driver_data from drm_dev->dev to obtain a struct vidi_context pointer. However, drm_dev->dev is the exynos-drm master device, and the driver_data contained therein is not the vidi component device, but a completely different device. This can lead to various bugs, ranging from null pointer dereferences and garbage value accesses to, in unlucky cases, out-of-bounds errors, use-after-free errors, and more. To resolve this issue, we need to store/delete the vidi device pointer in exynos_drm_private->vidi_dev during bind/unbind, and then read this exynos_drm_private->vidi_dev within ioctl() to obtain the correct struct vidi_context pointer. Cc: <stable@vger.kernel.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysata: libata-scsi: avoid Non-NCQ command starvationDamien Le Moal4-0/+106
commit 0ea84089dbf62a92dc7889c79e6b18fc89260808 upstream. When a non-NCQ command is issued while NCQ commands are being executed, ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command deferring is correct and as mandated by the ACS specifications since NCQ and non-NCQ commands cannot be mixed. However, in the case of a host adapter using multiple submission queues, when the target device is under a constant load of NCQ commands, there are no guarantees that requeueing the non-NCQ command will be executed later and it may be deferred again repeatedly as other submission queues can constantly issue NCQ commands from different CPUs ahead of the non-NCQ command. This can lead to very long delays for the execution of non-NCQ commands, and even complete starvation for these commands in the worst case scenario. Since the block layer and the SCSI layer do not distinguish between queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT implementation must ensure forward progress for non-NCQ commands in the presence of NCQ command traffic. This is similar to what SAS HBAs with a hardware/firmware based SAT implementation do. Implement such forward progress guarantee by limiting requeueing of non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is received and NCQ commands are in-flight, do not force a requeue of the non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0 to indicate that the command was accepted but hold on to the qc using the new deferred_qc field of struct ata_port. This deferred qc will be issued using the work item deferred_qc_work running the function ata_scsi_deferred_qc_work() once all in-flight commands complete, which is checked with the port qc_defer() callback return value indicating that no further delay is necessary. This check is done using the helper function ata_scsi_schedule_deferred_qc() which is called from ata_scsi_qc_complete(). This thus excludes this mechanism from all internal non-NCQ commands issued by ATA EH. When a port deferred_qc is non NULL, that is, the port has a command waiting for the device queue to drain, the issuing of all incoming commands (both NCQ and non-NCQ) is deferred using the regular busy mechanism. This simplifies the code and also avoids potential denial of service problems if a user issues too many non-NCQ commands. Finally, whenever ata EH is scheduled, regardless of the reason, a deferred qc is always requeued so that it can be retried once EH completes. This is done by calling the function ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids the need for any special processing for the deferred qc in case of NCQ error, link or device reset, or device timeout. Reported-by: Xingui Yang <yangxingui@huawei.com> Reported-by: Igor Pylypiv <ipylypiv@google.com> Fixes: bdb01301f3ea ("scsi: Add host and host template flag 'host_tagset'") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Tested-by: Igor Pylypiv <ipylypiv@google.com> Tested-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysata: libata-scsi: refactor ata_scsi_translate()Damien Le Moal1-31/+50
commit bb3a8154b1a1dc2c86d037482c0a2cf9186829ed upstream. Factor out of ata_scsi_translate() the code handling queued command deferral using the port qc_defer callback and issuing the queued command with ata_qc_issue() into the new function ata_scsi_qc_issue(), and simplify the goto used in ata_scsi_translate(). While at it, also add a lockdep annotation to check that the port lock is held when ata_scsi_translate() is called. No functional changes. Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysata: pata_ftide010: Fix some DMA timingsLinus Walleij1-3/+3
commit ff4a46c278ac6a4b3f39be1492a4568b6dcc6105 upstream. The FTIDE010 has been missing some timing settings since its inception, since the upstream OpenWrt patch was missing these. The community has since come up with the appropriate timings. Fixes: be4e456ed3a5 ("ata: Add driver for Faraday Technology FTIDE010") Cc: stable@vger.kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysusb: cdns3: fix role switching during resumeThomas Richard (TI)1-1/+1
commit 87e4b043b98a1d269be0b812f383881abee0ca45 upstream. If the role change while we are suspended, the cdns3 driver switches to the new mode during resume. However, switching to host mode in this context causes a NULL pointer dereference. The host role's start() operation registers a xhci-hcd device, but its probe is deferred while we are in the resume path. The host role's resume() operation assumes the xhci-hcd device is already probed, which is not the case, leading to the dereference. Since the start() operation of the new role is already called, the resume operation can be skipped. So skip the resume operation for the new role if a role switch occurs during resume. Once the resume sequence is complete, the xhci-hcd device can be probed in case of host mode. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000208 Mem abort info: ... Data abort info: ... [0000000000000208] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: CPU: 0 UID: 0 PID: 146 Comm: sh Not tainted 6.19.0-rc7-00013-g6e64f4aabfae-dirty #135 PREEMPT Hardware name: Texas Instruments J7200 EVM (DT) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : usb_hcd_is_primary_hcd+0x0/0x1c lr : cdns_host_resume+0x24/0x5c ... Call trace: usb_hcd_is_primary_hcd+0x0/0x1c (P) cdns_resume+0x6c/0xbc cdns3_controller_resume.isra.0+0xe8/0x17c cdns3_plat_resume+0x18/0x24 platform_pm_resume+0x2c/0x68 dpm_run_callback+0x90/0x248 device_resume+0x100/0x24c dpm_resume+0x190/0x2ec dpm_resume_end+0x18/0x34 suspend_devices_and_enter+0x2b0/0xa44 pm_suspend+0x16c/0x5fc state_store+0x80/0xec kobj_attr_store+0x18/0x2c sysfs_kf_write+0x7c/0x94 kernfs_fop_write_iter+0x130/0x1dc vfs_write+0x240/0x370 ksys_write+0x70/0x108 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x48/0x10c el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0x108 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c Code: 52800003 f9407ca5 d63f00a0 17ffffe4 (f9410401) ---[ end trace 0000000000000000 ]--- Cc: stable <stable@kernel.org> Fixes: 2cf2581cd229 ("usb: cdns3: add power lost support for system resume") Signed-off-by: Thomas Richard (TI) <thomas.richard@bootlin.com> Acked-by: Peter Chen <peter.chen@kernel.org> Link: https://patch.msgid.link/20260130-usb-cdns3-fix-role-switching-during-resume-v1-1-44c456852b52@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysPCI: dwc: ep: Always clear IB maps on BAR updateKoichiro Den1-3/+11
[ Upstream commit 8c746e22096579897d1f8f74dbb6b17a6862fb6d ] dw_pcie_ep_set_bar() currently tears down existing inbound mappings only when either the previous or the new struct pci_epf_bar uses submaps (num_submap != 0). If both the old and new mappings are BAR Match Mode, reprogramming the same ATU index is sufficient, so no explicit teardown was needed. However, some callers may reuse the same struct pci_epf_bar instance and update it in place before calling set_bar() again. In that case ep_func->epf_bar[bar] and the passed-in epf_bar can point to the same object, so we cannot reliably distinguish BAR Match Mode -> BAR Match Mode from Address Match Mode -> BAR Match Mode. As a result, the conditional teardown based on num_submap becomes unreliable and existing inbound maps may be left active. Call dw_pcie_ep_clear_ib_maps() unconditionally before reprogramming the BAR so that in-place updates are handled correctly. This introduces a behavioral change in a corner case: if a BAR reprogramming attempt fails (especially for the long-standing BAR Match Mode -> BAR Match Mode update case), the previously programmed inbound mapping will already have been torn down. This should be acceptable, since the caller observes the error and should not use the BAR for any real transactions in that case. While at it, document that the existing update parameter check is best-effort for in-place updates. Fixes: cc839bef7727 ("PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU") Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20260202145407.503348-3-den@valinux.co.jp Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Enable DAC in DCE link encoderTimur Kristóf6-17/+50
[ Upstream commit 4bd8b5f8bcb57b430c35494d8a2471ce5fd7661d ] Ensure that the DAC output is enabled at the correct time by moving it to the DCE link encoder similarly to how digital outputs are enabled. This also removes the call to DAC1EncoderControl from the DCE HWSS, which always felt like it was a hacky solution. Fixes: 0fbe321a93ce ("drm/amd/display: Implement DCE analog link encoders (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Set CRTC source for DAC using registersTimur Kristóf6-37/+43
[ Upstream commit cbced93894d145239c83881d7fd953b7392c23a8 ] Apparently the VBIOS SelectCRTC_Source function overwrites a few registers (such as FMT_*) which DC writes in a different place, which can cause problems. Instead of using the SelectCRTC_Source function from the VBIOS, use the DAC_SOURCE_SELECT register directly, similarly to how it is done for digital link encoders. Fixes: 3be26d81b150 ("drm/amd/display: Support DAC in dce110_hwseq") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Initialize DAC in DCE link encoder using VBIOSTimur Kristóf2-2/+11
[ Upstream commit e2a024345bce78a8e1ed7d9e84c859b05979e41e ] The VBIOS DAC1EncoderControl() function can initialize the DAC, by writing board-specific values to certain registers. Call this at link encoder hardware initialization time similarly to how the equivalent UNIPHYTransmitterControl initialization is done. This fixes DAC output on the Radeon HD 7790. Also remove the ENCODER_CONTROL_SETUP enum from the dac_encoder_control_prepare_params function which is actually not a supported operation for DAC encoders. Fixes: 0fbe321a93ce ("drm/amd/display: Implement DCE analog link encoders (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Mauro Rossi <issor.oruam@gmail.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Turn off DAC in DCE link encoder using VBIOSTimur Kristóf2-16/+17
[ Upstream commit e021ee995056ee7e58114edd92bcd4578d8b4bb5 ] Apparently, the VBIOS DAC1EncoderControl function is much more graceful about turning off the DAC. It writes various DAC registers in a specific sequence. Use that instead of just clearing the DAC_ENABLE register. Do this in just the dce110_link_encoder_disable_output function and remove it from the HWSS. Fixes: 0fbe321a93ce ("drm/amd/display: Implement DCE analog link encoders (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Mauro Rossi <issor.oruam@gmail.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Don't call find_analog_engine() twiceTimur Kristóf1-1/+0
[ Upstream commit 613b1737abe1bd0a65b49851e777231302095e28 ] The analog engine is already there in the link_analog_engine variable and assigned to enc_init_data.analog_engine already. I suspect this was a rebase mistake. Fixes: 436d0d22aa70 ("drm/amd/display: Pass proper DAC encoder ID to VBIOS") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Use same max plane scaling limits for all 64 bpp formatsMario Kleiner1-0/+5
[ Upstream commit f0157ce46cf0e5e2257e19d590c9b16036ce26d4 ] The plane scaling hw seems to have the same min/max plane scaling limits for all 16 bpc / 64 bpp interleaved pixel color formats. Therefore add cases to amdgpu_dm_plane_get_min_max_dc_plane_scaling() for all the 16 bpc fixed-point / unorm formats to use the same .fp16 up/downscaling factor limits as used by the fp16 floating point formats. So far, 16 bpc unorm formats were not handled, and the default: path returned max/min factors for 32 bpp argb8888 formats, which were wrong and bigger than what many DCE / DCN hw generations could handle. The result sometimes was misscaling of framebuffers with DRM_FORMAT_XRGB16161616, DRM_FORMAT_ARGB16161616, DRM_FORMAT_XBGR16161616, DRM_FORMAT_ABGR16161616, leading to very wrong looking display, as tested on Polaris11 / DCE-11.2. So far this went unnoticed, because only few userspace clients used such 16 bpc unorm framebuffers, and those didn't use hw plane scaling, so they did not experience this issue. With upcoming Mesa 26 exposing 16 bpc unorm formats under both OpenGL and Vulkan under Wayland, and the upcoming GNOME 50 Mutter Wayland compositor allowing for direct scanout of these formats, the scaling hw will be used on these formats if possible for HiDPI display scaling, so it is important to use the correct hw scaling limits to avoid wrong display. Tested on AMD Polaris 11 / DCE 11.2 with upcoming Mesa 26 and GNOME 50 on HiDPI displays with scaling enabled. The mutter Wayland compositor now correctly falls back to scaling via desktop compositing instead of direct scanout, thereby avoiding wrong image display. For unscaled mode, it correctly uses direct scanout. Fixes: 580204038f5b ("drm/amd/display: Enable support for 16 bpc fixed-point framebuffers.") Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Only use analog stream encoder with analog engineTimur Kristóf1-1/+4
[ Upstream commit 17ff034f805e032ed1358624a71381f9d6e29e9e ] Some GPUs have analog connectors that work with a DP bridge chip and don't actually have an internal DAC: Those should not use the analog stream encoders. Fixes: 5834c33fd3f6 ("drm/amd/display: Add concept of analog encoders (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Only use analog link encoder with analog engineTimur Kristóf3-3/+6
[ Upstream commit f402898bd101af3166bde236b7f6a43d926e17a0 ] Some GPUs have analog connectors that work with a DP bridge chip and don't actually have an internal DAC: Those should not use the analog link encoder code path. Fixes: 0fbe321a93ce ("drm/amd/display: Implement DCE analog link encoders (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Use DCE 6 link encoder for DCE 6 analog connectorsTimur Kristóf1-1/+1
[ Upstream commit 2de34fbcab2063cd3d52e5872a801b9a5fc755d0 ] DCE 6 should use the DCE 6 specific link encoder. This was a copy paste mistake. Fixes: 0fbe321a93ce ("drm/amd/display: Implement DCE analog link encoders (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysefi: Fix reservation of unaccepted memory tableKiryl Shutsemau (Meta)1-4/+4
[ Upstream commit 0862438c90487e79822d5647f854977d50381505 ] The reserve_unaccepted() function incorrectly calculates the size of the memblock reservation for the unaccepted memory table. It aligns the size of the table, but fails to account for cases where the table's starting physical address (efi.unaccepted) is not page-aligned. If the table starts at an offset within a page and its end crosses into a subsequent page that the aligned size does not cover, the end of the table will not be reserved. This can lead to the table being overwritten or inaccessible, causing a kernel panic in accept_memory(). This issue was observed when starting Intel TDX VMs with specific memory sizes (e.g., > 64GB). Fix this by calculating the end address first (including the unaligned start) and then aligning it up, ensuring the entire range is covered by the reservation. Fixes: 8dbe33956d96 ("efi/unaccepted: Make sure unaccepted table is mapped") Reported-by: Moritz Sanft <ms@edgeless.systems> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysgpio: amd-fch: ionly return allowed values from amd_fch_gpio_get()Dmitry Torokhov1-3/+4
[ Upstream commit fbd03587ba732c612b8a569d1cf5bed72bd3a27c ] As of 86ef402d805d ("gpiolib: sanitize the return value of gpio_chip::get()") gpiolib requires drivers implementing GPIOs to only return 0, 1 or negative error for the get() callbacks. Ensure that amd-fch complies with this requirement. Fixes: 86ef402d805d ("gpiolib: sanitize the return value of gpio_chip::get()") Reported-and-tested-by: Tj <tj.iam.tj@proton.me> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://patch.msgid.link/aZTlwnvHt2Gho4yN@google.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe/bo: Redirect faults to dummy page for wedged deviceRaag Jadav1-1/+1
[ Upstream commit 4e83a8d58e1c721a89b3ffe15f549007080272e2 ] As per uapi documentation[1], the prerequisite for wedged device is to redirected page faults to a dummy page. Follow it. [1] Documentation/gpu/drm-uapi.rst v2: Add uapi reference and fixes tag (Matthew Brost) Fixes: 7bc00751f877 ("drm/xe: Use device wedged event") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260212055622.2054991-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit c020fff70d757612933711dd3cc3751d7d782d3c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe: Make xe_modparam.force_vram_bar_size signedShuicheng Lin1-1/+1
[ Upstream commit 1acec6ef0511b92e7974cc5a8768bfd3a659feaf ] vram_bar_size is registered as an int module parameter and is documented to accept negative values to disable BAR resizing. Store it as an int in xe_modparam as well, so negative values work as intended and the module_param type matches. Fixes: 80742a1aa26e ("drm/xe: Allow to drop vram resizing") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202181853.1095736-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 25c9aa4dcb5ef2ad9f354d19f8f1eeb690d1c161) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe/vf: Avoid reading media version when media GT is disabledPiotr Piórkowski1-0/+6
[ Upstream commit 5e905ec67214444362b81345ef8fde63e58425b6 ] When the media GT is not allowed, a VF must not attempt to read the media version from the GuC. The GuC may not be loaded, and any attempt to communicate with it would result in a timeout and a VF probe failure: (...) [ 1912.406046] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: GuC mmio request 0x5507: no reply 0x5507 [ 1912.407277] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: [GUC COMMUNICATION] MMIO send failed (-ETIMEDOUT) [ 1912.408689] xe 0000:01:00.1: [drm] *ERROR* VF: Tile0: GT1: Failed to reset GuC state (-ETIMEDOUT) [ 1912.413986] xe 0000:01:00.1: probe with driver xe failed with error -110 Let's skip reading the media version for VFs when the media GT is not allowed. v2: move the condition directly to the VF path Fixes: 7abd69278bb5 ("drm/xe/configfs: Add attribute to disable GT types") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202115041.2863357-1-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit 0bcacf56dc0b265f9c47056c6a4f0c1394a8a3f0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe/xe2_hpg: Fix handling of Wa_14019988906 & Wa_14019877138Matt Roper1-10/+8
[ Upstream commit bc6387a2e0c1562faa56ce2a98cef50cab809e08 ] The PSS_CHICKEN register has been part of the RCS engine's LRC since it was first introduced in Xe_LP. That means that any workarounds that adjust its value (such as Wa_14019988906 and Wa_14019877138) need to be implemented in the lrc_was[] table so that they become part of the default LRC from which all subsequent LRCs are copied. Although these workarounds were implemented correctly on most platforms, they were incorrectly placed on the engine_was[] table for Xe2_HPG. Move the workarounds to the proper lrc_was[] table and switch the 'xe_rtp_match_first_render_or_compute' rule to specifically match the RCS since that's the engine whose LRC manages the register. Bspec: 65182 Fixes: 7f3ee7d88058 ("drm/xe/xe2hpg: Add initial GT workarounds") Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patch.msgid.link/20260205220508.51905-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit e04c609eedf4d6748ac0bcada4de1275b034fed6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe/mmio: Avoid double-adjust in 64-bit readsShuicheng Lin1-5/+5
[ Upstream commit 4a9b4e1fa52a6aaa1adbb7f759048df14afed54c ] xe_mmio_read64_2x32() was adjusting register addresses and then calling xe_mmio_read32(), which applies the adjustment again. This may shift accesses twice if adj_offset < adj_limit. There is no issue currently, as for media gt, adj_offset > adj_limit, so the 2nd adjust will be a no-op. But it may not work in future. To fix it, replace the adjusted-address comparison with a direct sanity check that ensures the MMIO address adjustment cutoff never falls within the 8-byte range of a 64-bit register. And let xe_mmio_read32() handle address translation. v2: rewrite the sanity check in a more natural way. (Matt) v3: Add Fixes tag. (Jani) Fixes: 07431945d8ae ("drm/xe: Avoid 64-bit register reads") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260130165621.471408-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit a30f999681126b128a43137793ac84b6a5b7443f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe/configfs: Fix 'parameter name omitted' errorsMichal Wajdeczko1-4/+8
[ Upstream commit 2a673fb4d787ce6672862cb693112378bff86abb ] On some configs and old compilers we can get following build errors: ../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_mid_bb': ../drivers/gpu/drm/xe/xe_configfs.h:40:76: error: parameter name omitted static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev, enum xe_engine_class, ^~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_post_bb': ../drivers/gpu/drm/xe/xe_configfs.h:42:77: error: parameter name omitted static inline u32 xe_configfs_get_ctx_restore_post_bb(struct pci_dev *pdev, enum xe_engine_class, ^~~~~~~~~~~~~~~~~~~~ when trying to define our configfs stub functions. Fix that. Fixes: 7a4756b2fd04 ("drm/xe/lrc: Allow to add user commands mid context switch") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260203193745.576-1-michal.wajdeczko@intel.com (cherry picked from commit f59cde8a2452b392115d2af8f1143a94725f4827) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/xe/pf: Fix sysfs initializationMichal Wajdeczko1-28/+26
[ Upstream commit bf7172cd25ed182f30af2cbb9f80c730dc717d8e ] In case of devm_add_action_or_reset() failure the provided cleanup action will be run immediately on the not yet initialized kobject. This may lead to errors like: [ ] kobject: '(null)' (ff110001393608e0): is not initialized, yet kobject_put() is being called. [ ] WARNING: lib/kobject.c:734 at kobject_put+0xd9/0x250, CPU#0: kworker/0:0/9 [ ] RIP: 0010:kobject_put+0xdf/0x250 [ ] Call Trace: [ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe] [ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe] [ ] xe_sriov_init_late+0x5f/0x2c0 [xe] [ ] xe_device_probe+0x5f2/0xc20 [xe] [ ] xe_pci_probe+0x396/0x610 [xe] [ ] local_pci_probe+0x47/0xb0 [ ] refcount_t: underflow; use-after-free. [ ] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x68/0xb0, CPU#0: kworker/0:0/9 [ ] RIP: 0010:refcount_warn_saturate+0x68/0xb0 [ ] Call Trace: [ ] kobject_put+0x174/0x250 [ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe] [ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe] [ ] xe_sriov_init_late+0x5f/0x2c0 [xe] [ ] xe_device_probe+0x5f2/0xc20 [xe] [ ] xe_pci_probe+0x396/0x610 [xe] [ ] local_pci_probe+0x47/0xb0 Fix that by calling kobject_init() and kobject_add() separately and register cleanup action after the kobject is initialized. Also make this cleanup registration a part of the create helper to fix another mistake, as in the loop we were wrongly passing parent kobject while registering cleanup action, and this resulted in some undetected leaks. Fixes: 5c170a4d9c53 ("drm/xe/pf: Prepare sysfs for SR-IOV admin attributes") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260203235332.1350-1-michal.wajdeczko@intel.com (cherry picked from commit 98b16727f07e26a5d4de84d88805ce7ffcfdd324) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysgpio: cdev: Avoid NULL dereference in linehandle_create()Douglas Anderson1-1/+1
[ Upstream commit 6af6be278e3ba2ffb6af5b796c89dfb3f5d9063e ] In linehandle_create(), there is a statement like this: retain_and_null_ptr(lh); Soon after, there is a debug printout that dereferences "lh", which will crash things. Avoid the crash by using handlereq.lines, which is the same value. Fixes: da7e394bf58f ("gpio: convert linehandle_create() to FD_PREPARE()") Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20260215120555.v2.1.I77c3eb563271c21870379eefd16ebbc4e09635bb@changeid Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysspi: wpcm-fiu: Fix potential NULL pointer dereference in wpcm_fiu_probe()Felix Gu1-1/+1
[ Upstream commit 888a0a802c467bbe34a42167bdf9d7331333440a ] platform_get_resource_byname() can return NULL, which would cause a crash when passed the pointer to resource_size(). Move the fiu->memory_size assignment after the error check for devm_ioremap_resource() to prevent the potential NULL pointer dereference. Fixes: 9838c182471e ("spi: wpcm-fiu: Add direct map support") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: J. Neuschäfer <j.ne@posteo.net> Link: https://patch.msgid.link/20260212-wpcm-v1-1-5b7c4f526aac@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Fix out-of-bounds stream encoder index v3Srinivasan Shanmugam6-24/+24
[ Upstream commit abde491143e4e12eecc41337910aace4e8d59603 ] eng_id can be negative and that stream_enc_regs[] can be indexed out of bounds. eng_id is used directly as an index into stream_enc_regs[], which has only 5 entries. When eng_id is 5 (ENGINE_ID_DIGF) or negative, this can access memory past the end of the array. Add a bounds check using ARRAY_SIZE() before using eng_id as an index. The unsigned cast also rejects negative values. This avoids out-of-bounds access. Fixes the below smatch error: dcn*_resource.c: stream_encoder_create() may index stream_enc_regs[eng_id] out of bounds (size 5). drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn351/dcn351_resource.c 1246 static struct stream_encoder *dcn35_stream_encoder_create( 1247 enum engine_id eng_id, 1248 struct dc_context *ctx) 1249 { ... 1255 1256 /* Mapping of VPG, AFMT, DME register blocks to DIO block instance */ 1257 if (eng_id <= ENGINE_ID_DIGF) { ENGINE_ID_DIGF is 5. should <= be <? Unrelated but, ugh, why is Smatch saying that "eng_id" can be negative? end_id is type signed long, but there are checks in the caller which prevent it from being negative. 1258 vpg_inst = eng_id; 1259 afmt_inst = eng_id; 1260 } else 1261 return NULL; 1262 ... 1281 1282 dcn35_dio_stream_encoder_construct(enc1, ctx, ctx->dc_bios, 1283 eng_id, vpg, afmt, --> 1284 &stream_enc_regs[eng_id], ^^^^^^^^^^^^^^^^^^^^^^^ This stream_enc_regs[] array has 5 elements so we are one element beyond the end of the array. ... 1287 return &enc1->base; 1288 } v2: use explicit bounds check as suggested by Roman/Dan; avoid unsigned int cast v3: The compiler already knows how to compare the two values, so the cast (int) is not needed. (Roman) Fixes: 2728e9c7c842 ("drm/amd/display: add DC changes for DCN351") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Mario Limonciello <superm1@kernel.org> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Reject cursor plane on DCE when scaled differently than primaryTimur Kristóf1-3/+8
[ Upstream commit 41af6215cdbcecd12920f211239479027904abf3 ] Currently DCE doesn't support the overlay cursor, so the dm_crtc_get_cursor_mode() function returns DM_CURSOR_NATIVE_MODE unconditionally. The outcome is that it doesn't check for the conditions that would necessitate the overlay cursor, meaning that it doesn't reject cases where the native cursor mode isn't supported on DCE. Remove the early return from dm_crtc_get_cursor_mode() for DCE and instead let it perform the necessary checks and return DM_CURSOR_OVERLAY_MODE. Add a later check that rejects when DM_CURSOR_OVERLAY_MODE would be used with DCE. Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4600 Suggested-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdkfd: Fix watch_id bounds checking in debug address watch v2Srinivasan Shanmugam1-8/+12
[ Upstream commit 5a19302cab5cec7ae7f1a60c619951e6c17d8742 ] The address watch clear code receives watch_id as an unsigned value (u32), but some helper functions were using a signed int and checked bits by shifting with watch_id. If a very large watch_id is passed from userspace, it can be converted to a negative value. This can cause invalid shifts and may access memory outside the watch_points array. drm/amdkfd: Fix watch_id bounds checking in debug address watch v2 Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before using it. Also use BIT(watch_id) to test and clear bits safely. This keeps the behavior unchanged for valid watch IDs and avoids undefined behavior for invalid ones. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448 kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow 'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c 433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, 434 uint32_t watch_id) 435 { 436 int r; 437 438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if watch_id is larger than INT_MAX it leads to a buffer overflow. (Negative shifts are undefined). 439 return -EINVAL; 440 441 if (!pdd->dev->kfd->shared_resources.enable_mes) { 442 r = debug_lock_and_unmap(pdd->dev->dqm); 443 if (r) 444 return r; 445 } 446 447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false); --> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( 449 pdd->dev->adev, 450 watch_id); v2: (as per, Jonathan Kim) - Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to match the clear path. - Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id(). Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points operation") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu: Fix missing unwind in amdgpu_ib_schedule() error pathSrinivasan Shanmugam1-1/+1
[ Upstream commit ba038065655c45728be346d0b174a6da08d8a5c5 ] amdgpu_ib_schedule() returns early after calling amdgpu_ring_undo(). This skips the common free_fence cleanup path. Other error paths were already changed to use goto free_fence, but this one was missed. Change the early return to goto free_fence so all error paths clean up the same way. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c:232 amdgpu_ib_schedule() warn: missing unwind goto? drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 124 int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, 125 struct amdgpu_ib *ibs, struct amdgpu_job *job, 126 struct dma_fence **f) 127 { ... 224 225 if (ring->funcs->insert_start) 226 ring->funcs->insert_start(ring); 227 228 if (job) { 229 r = amdgpu_vm_flush(ring, job, need_pipe_sync); 230 if (r) { 231 amdgpu_ring_undo(ring); --> 232 return r; The patch changed the other error paths to goto free_fence but this one was accidentally skipped. 233 } 234 } 235 236 amdgpu_ring_ib_begin(ring); ... 338 339 free_fence: 340 if (!job) 341 kfree(af); 342 return r; 343 } Fixes: f903b85ed0f1 ("drm/amdgpu: fix possible fence leaks from job structure") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amd/display: Fix dc_link NULL handling in HPD initSrinivasan Shanmugam1-5/+4
[ Upstream commit 226a40c06a183abaeb7529a4f54d6c203bd14407 ] amdgpu_dm_hpd_init() may see connectors without a valid dc_link. The code already checks dc_link for the polling decision, but later unconditionally dereferences it when setting up HPD interrupts. Assign dc_link early and skip connectors where it is NULL. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_irq.c:940 amdgpu_dm_hpd_init() error: we previously assumed 'dc_link' could be null (see line 931) drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_irq.c 923 /* 924 * Analog connectors may be hot-plugged unlike other connector 925 * types that don't support HPD. Only poll analog connectors. 926 */ 927 use_polling |= 928 amdgpu_dm_connector->dc_link && ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The patch adds this NULL check but hopefully it can be removed 929 dc_connector_supports_analog(amdgpu_dm_connector->dc_link->link_id.id); 930 931 dc_link = amdgpu_dm_connector->dc_link; dc_link assigned here. 932 933 /* 934 * Get a base driver irq reference for hpd ints for the lifetime 935 * of dm. Note that only hpd interrupt types are registered with 936 * base driver; hpd_rx types aren't. IOW, amdgpu_irq_get/put on 937 * hpd_rx isn't available. DM currently controls hpd_rx 938 * explicitly with dc_interrupt_set() 939 */ --> 940 if (dc_link->irq_source_hpd != DC_IRQ_SOURCE_INVALID) { ^^^^^^^^^^^^^^^^^^^^^^^ If it's NULL then we are trouble because we dereference it here. 941 irq_type = dc_link->irq_source_hpd - DC_IRQ_SOURCE_HPD1; 942 /* 943 * TODO: There's a mismatch between mode_info.num_hpd 944 * and what bios reports as the # of connectors with hpd Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Cc: Timur Kristóf <timur.kristof@gmail.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Mario Limonciello <superm1@kernel.org> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysPCI: Validate window resource type in pbus_select_window_for_type()Kai-Heng Feng1-3/+10
[ Upstream commit e5f72cb9cea599dc9f5a9b80a33560a1d06f01cc ] After ebe091ad81e1 ("PCI: Use pbus_select_window_for_type() during IO window sizing") and ae88d0b9c57f ("PCI: Use pbus_select_window_for_type() during mem window sizing"), many bridge windows can't get resources assigned: pci 0006:05:00.0: bridge window [??? 0x00001000-0x00001fff flags 0x20080000]: can't assign; no space pci 0006:05:00.0: bridge window [??? 0x00001000-0x00001fff flags 0x20080000]: failed to assign Those commits replace find_bus_resource_of_type() with pbus_select_window_for_type(), and the latter lacks resource type validation. Add the resource type validation back to pbus_select_window_for_type() to match the original behavior. Fixes: 74afce3dfcba ("PCI: Add bridge window selection functions") Link: https://bugzilla.kernel.org/show_bug.cgi?id=221072 Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20260210142058.82701-1-kaihengf@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/i915/acpi: free _DSM package when no connectorsKaushlendra Kumar1-0/+1
[ Upstream commit 57b85fd53fccfdf14ce7b36d919c31aa752255f8 ] acpi_evaluate_dsm_typed() returns an ACPI package in pkg. When pkg->package.count == 0, we returned without freeing pkg, leaking memory. Free pkg before returning on the empty case. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Fixes: 337d7a1621c7 ("drm/i915: Fix invalid access to ACPI _DSM objects") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260109032549.1826303-1-kaushlendra.kumar@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit c0a27a0ca8a34e96d08bb05a2c5d5ccf63fb8dc0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysregulator: mt6363: Fix interrmittent timeoutAdam Ford1-1/+8
[ Upstream commit 1a4b0c999101b2532723f9bd9818b70ffa7580f4 ] Sometimes, the mt6363 regulator would fail to initialize and return with a TIMEOUT error, so add an extra instruction to wake up the bus before issuing the commands. Fixes: 3c36965df808 ("regulator: Add support for MediaTek MT6363 SPMI PMIC Regulators") Signed-off-by: Adam Ford <aford173@gmail.com> Link: https://patch.msgid.link/20260210053708.17239-4-aford173@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysmshv: fix SRCU protection in irqfd resampler ack handlerLi RongQing1-2/+3
[ Upstream commit 2e7577cd5ddc1f86d1b6c48caf3cfa87dbb14e34 ] Replace hlist_for_each_entry_rcu() with hlist_for_each_entry_srcu() in mshv_irqfd_resampler_ack() to correctly handle SRCU-protected linked list traversal. The function uses SRCU (sleepable RCU) synchronization via partition->pt_irq_srcu, but was incorrectly using the RCU variant for list iteration. This could lead to race conditions when the list is modified concurrently. Also add srcu_read_lock_held() assertion as required by hlist_for_each_entry_srcu() to ensure we're in the proper read-side critical section. Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs") Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu: clean up the amdgpu_cs_parser_bosSunil Khatri1-2/+4
[ Upstream commit f025a2b8d93358467b8e8f4b3a617e88c5f02fab ] In low memory conditions, kmalloc can fail. In such conditions unlock the mutex for a clean exit. We do not need to amdgpu_bo_list_put as it's been handled in the amdgpu_cs_parser_fini. Fixes: 737da5363cc0 ("drm/amdgpu: update the functions to use amdgpu version of hmm") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202602030017.7E0xShmH-lkp@intel.com/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu/sdma6: enable queue resets unconditionallyAlex Deucher1-12/+3
[ Upstream commit 56423871e9eef1dd069bddef895207fa5ce275fe ] There is no firmware version dependency. This also enables sdma queue resets on all SDMA 6.x based chips. Fixes: 59fd50b8663b ("drm/amdgpu: Add sysfs interface for sdma reset mask") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu/sdma5.2: enable queue resets unconditionallyAlex Deucher1-19/+3
[ Upstream commit 314d30ad50622fc0d70da71509f9dff21545be14 ] There is no firmware version dependency. This also enables sdma queue resets on all SDMA 5.2.x based chips. Fixes: 59fd50b8663b ("drm/amdgpu: Add sysfs interface for sdma reset mask") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu/sdma5: enable queue resets unconditionallyAlex Deucher1-12/+3
[ Upstream commit 46a2cb7d24f21132e970cab52359210c3f5ea3c6 ] There is no firmware version dependency. Fixes: 59fd50b8663b ("drm/amdgpu: Add sysfs interface for sdma reset mask") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu: Fix memory leak in amdgpu_ras_init()Zilin Guan1-1/+1
[ Upstream commit ee41e5b63c8210525c936ee637a2c8d185ce873c ] When amdgpu_nbio_ras_sw_init() fails in amdgpu_ras_init(), the function returns directly without freeing the allocated con structure, leading to a memory leak. Fix this by jumping to the release_con label to properly clean up the allocated memory before returning the error code. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: fdc94d3a8c88 ("drm/amdgpu: Rework pcie_bif ras sw_init") Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu: Use kvfree instead of kfree in amdgpu_gmc_get_nps_memranges()Zilin Guan1-1/+1
[ Upstream commit 0c44d61945c4a80775292d96460aa2f22e62f86c ] amdgpu_discovery_get_nps_info() internally allocates memory for ranges using kvcalloc(), which may use vmalloc() for large allocation. Using kfree() to release vmalloc memory will lead to a memory corruption. Use kvfree() to safely handle both kmalloc and vmalloc allocations. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysdrm/amdgpu: Fix memory leak in amdgpu_acpi_enumerate_xcc()Zilin Guan1-1/+3
[ Upstream commit c9be63d565789b56ca7b0197e2cb78a3671f95a8 ] In amdgpu_acpi_enumerate_xcc(), if amdgpu_acpi_dev_init() returns -ENOMEM, the function returns directly without releasing the allocated xcc_info, resulting in a memory leak. Fix this by ensuring that xcc_info is properly freed in the error paths. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: 4d5275ab0b18 ("drm/amdgpu: Add parsing of acpi xcc objects") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet/mlx5e: Use unsigned for mlx5e_get_max_num_channelsCosmin Ratiu1-1/+2
[ Upstream commit 57a94d4b22b0c6cc5d601e6b6238d78fb923d991 ] The max number of channels is always an unsigned int, use the correct type to fix compilation errors done with strict type checking, e.g.: error: call to ‘__compiletime_assert_1110’ declared with attribute error: min(mlx5e_get_devlink_param_num_doorbells(mdev), mlx5e_get_max_num_channels(mdev)) signedness error Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260218072904.1764634-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet/mlx5e: Fix deadlocks between devlink and netdev instance locksCosmin Ratiu4-58/+61
[ Upstream commit 83ac0304a2d77519dae1e54c9713cbe1aedf19c9 ] In the mentioned "Fixes" commit, various work tasks triggering devlink health reporter recovery were switched to use netdev_trylock to protect against concurrent tear down of the channels being recovered. But this had the side effect of introducing potential deadlocks because of incorrect lock ordering. The correct lock order is described by the init flow: probe_one -> mlx5_init_one (acquires devlink lock) -> mlx5_init_one_devl_locked -> mlx5_register_device -> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe -> register_netdev (acquires rtnl lock) -> register_netdevice (acquires netdev lock) => devlink lock -> rtnl lock -> netdev lock. But in the current recovery flow, the order is wrong: mlx5e_tx_err_cqe_work (acquires netdev lock) -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report -> devlink_health_report (acquires devlink lock => boom!) -> devlink_health_reporter_recover -> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx -> mlx5e_tx_reporter_err_cqe_recover The same pattern exists in: mlx5e_reporter_rx_timeout mlx5e_reporter_tx_ptpsq_unhealthy mlx5e_reporter_tx_timeout Fix these by moving the netdev_trylock calls from the work handlers lower in the call stack, in the respective recovery functions, where they are actually necessary. Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260218072904.1764634-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysnet/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_eventGal Pressman1-1/+2
[ Upstream commit 9854b243ce4225328d0b32fdc998d35b6952d3f7 ] The macsec_aso_set_arm_event function calls mlx5_aso_poll_cq once without a retry loop. If the CQE is not immediately available after posting the WQE, the function fails unnecessarily. Use read_poll_timeout() to poll 3-10 usecs for CQE, consistent with other ASO polling code paths in the driver. Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260218072904.1764634-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>