| Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull drm fixes from Dave Airlie:
"These are just the fixes from our fixes branch, all pretty small and
scattered.
sysfb:
- drm/sysfb truncation and alignment fixes
edid:
- fix edid OOB read in tile parsing
- increase displayid topology id to correct size
nouveau:
- fix error handling paths in nouveau
amdxdna:
- get_bo_info fix
ivpu:
- fix leak when error handling in ivpu"
* tag 'drm-fixes-2026-06-27' of https://gitlab.freedesktop.org/drm/kernel:
drm/sysfb: Avoid truncating maximum stride
drm/sysfb: Return errno code from drm_sysfb_get_visible_size()
drm/sysfb: Avoid possible truncation with calculating visible size
drm/sysfb: Do not page-align visible size of the framebuffer
drm/edid: fix OOB read in drm_parse_tiled_block()
drm/nouveau: fix reversed error cleanup order in ucopy functions
drm/nouveau/acr: fix missing nvkm_done() in error path of nvkm_acr_oneinit()
accel/amdxdna: Use caller client for debug BO sync
drm/displayid: fix Tiled Display Topology ID size
accel/ivpu: fix HWS command queue leak on registration failure
|
|
Pull drm merge window fixes from Dave Airlie:
"This is the merge window fixes from our next tree, i915/xe and amdgpu
make up all of it.
I've got a separate fixes pull from our fixes branch arriving after
this.
i915:
- Fix corrupted display output on GLK, #16209
- Add missing Spectre mitigation for parallel submit IOCTL
- MTL+ fix for DP resume
- clear CRTC blobs after dropping refs
- fix sharpness filter on DP MST
xe:
- Set TTM beneficial order to 9 in Xe
- Several error path cleanups
- Fix TDR for unstarted jobs on kernel queues
- Several TLB invalidation fixes related to suspending LR queues
- Some small RAS fixes
- Multi-queue suspend fix for LR queues
- Revert inclusion of NVL_S firmware
amdgpu:
- devcoredump fixes
- SMU15 fix
- Various irq put/get imbalance cleanup fixes
- 8K panel fix
- DCN3.5 fix
- lockdep fix
- Cleaner shader sysfs IB overflow fix
- Async flip fixes
- GET_MAPPING_INFO fix
- CP_GFX_SHADOW fix
- Ctx pstate handling fix
- GTT bo move handling fixes
- Old UVD BO placement fixes
- GC9 mode2 reset fix
- IH6.1 version fix
- Soft IH ring fix
amdkfd:
- Fix doorbell/mmio double unpin on free
- CRIU fixes
- SMI event fixes
- Sysfs teardown fix
- Various boundary checking fixes
- Various error checking fixes
- SVM fix"
* tag 'drm-next-2026-06-27' of https://gitlab.freedesktop.org/drm/kernel: (52 commits)
drm/i915/cdclk: Fix up CDCLK_FREQ_DECIMAL without a full PLL re-enable
drm/i915/gem: Add missing nospec on parallel submit slot
drm/amdgpu: Use system unbound workqueue for soft IH ring
amdgpu/ih6.1: Fix minor version
drm/amdkfd: Use exclusive bounds for SVM split alignment checks
drm/amdgpu/gfx9: Fix Ring and IB test fail after mode2
drm/amdgpu/uvd: Fix forcing MSG, FB BOs into VCPU segment when it isn't at 0 (v2)
drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older
drm/amdgpu: Fix amdgpu_bo_move() when old_mem and new_mem are both GTT
drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions
drm/amdgpu: Fix context pstate override handling
drm/amdkfd: Use memdup_array_user to copy data from/to user space at kfd ioctls
drm/amdkfd: check find_first_zero_bit before __set_bit on kfd->doorbell_bitmap
drm/amdkfd: Let driver decide buffer size at AMDKFD_IOC_GET_DMABUF_INFO ioctl
drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format
drm/amdgpu: convert amdgpu_vm_lock_by_pasid() to drm_exec
drm/amdgpu: Don't use UTS_RELEASE directly
drm/amdkfd: Fix NULL deref during sysfs teardown
drm/amdgpu: validate CP_GFX_SHADOW chunk size in CS pass1
drm/amdgpu: check amdgpu_vm_bo_find() result in GET_MAPPING_INFO
...
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v7.2:
- drm/sysfb truncation and alignment fixes.
- fix edid OOB read.
- fix error handling paths in nouveau
- amdxdna get_bo_info fix.
- increase displayid topology id to correct size.
- fix leak when error handling in ivpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/2d17f718-43f5-4772-9c04-a975c9ad4bc3@linux.intel.com
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
- Fix corrupted display output on GLK, #16209 (Ville)
- Add missing Spectre mitigation for parallel submit IOCTL (Joonas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/ajzIhInnHnGCwMlu@jlahtine-mobl
|
|
The GOP (and even Bspec on some platforms) is a bit inconsistent
on what the CDCLK_FREQ_DECIMAL divider should be. Currently any
mismatch there causes a full CDCLK PLL disable+re-enable, which
we really don't want to do if any displays are currently active.
Let's instead just reprogram CDCLK_FREQ_DECIMAL when that is the
only thing amiss. For any other (more serious) mismatch we still
punt to the full PLL reprogramming.
We also need to tweak the bxt_cdclk_cd2x_pipe() stuff a bit to
consistently select pipe==NONE since we have no idea which pipes
are enabled at this point. Since we're not actually changing the
CDCLK frequency here we don't need to sync the update to any
pipe.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16209
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260612173653.7830-2-ville.syrjala@linux.intel.com
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
(cherry picked from commit 3f9de66f8acbf8ff45a91b4920605ed10c6b7c06)
Fixes: ba91b9eecb47 ("drm/i915/cdclk: Decouple cdclk from state->modeset")
Fixes: d66a21947e21 ("drm/i915/bxt: Sanitize CDCLK to fix breakage during S4 resume")
Fixes: c73666f394fc ("drm/i915/skl: If needed sanitize bios programmed cdclk")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Add missing Spectre mitigation for userspace controlled parallel
submission slot.
Discovered using AI-assisted static analysis confirmed by Intel
Product Security.
Reported-by: Martin Hodo <martin.hodo@intel.com>
Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: <stable@vger.kernel.org> # v5.16+
Link: https://patch.msgid.link/20260622132539.165558-1-joonas.lahtinen@linux.intel.com
(cherry picked from commit 15b9353deff3cf72331c387780de3cf9c316b643)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Passing a maximum as 64-bit type to drm_sysfb_get_validated_int0()
can truncate the value to 32 bits. Use drm_sysfb_get_validated_size0(),
which uses 64-bit arithmetics. Then test the returned stride against
the limits of int to avoid truncations in the returned value. A valid
stride is in the range of [1, INT_MAX] inclusive.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/dri-devel/20260617114016.5A5991F000E9@smtp.kernel.org/
Fixes: 32ae90c66fb6 ("drm/sysfb: Add efidrm for EFI displays")
Fixes: a84eb6abe2b6 ("drm/sysfb: Add vesadrm for VESA displays")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.16+
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patch.msgid.link/20260618084327.46567-5-tzimmermann@suse.de
|
|
Change the return type of drm_sysfb_get_visible_size() to s64 so
that it returns a possible errno code from _get_validated_size0().
Fix callers to handle the errno code.
The currently returned unsigned type converts an errno code to a
very large size value, which drivers interpret as visible size of
the system framebuffer. Later efforts to reserve the framebuffer
resource fail.
The bug has been present since efidrm and vesadrm got merged. It
was then part of each driver.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 32ae90c66fb6 ("drm/sysfb: Add efidrm for EFI displays")
Fixes: a84eb6abe2b6 ("drm/sysfb: Add vesadrm for VESA displays")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.16+
Link: https://patch.msgid.link/20260618084327.46567-4-tzimmermann@suse.de
|
|
Calculating the visible size of the system framebuffer can result in
truncation of the result. The calculation uses 32-bit arithmetics,
which can overflow if the values for height and stride are large. Fix
the issue by multiplying with mul_u32_u32().
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 32ae90c66fb6 ("drm/sysfb: Add efidrm for EFI displays")
Fixes: a84eb6abe2b6 ("drm/sysfb: Add vesadrm for VESA displays")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/dri-devel/20260617114027.1F2A71F000E9@smtp.kernel.org/
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.16+
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patch.msgid.link/20260618084327.46567-3-tzimmermann@suse.de
|
|
Only return the actually visible size of the system framebuffer in
drm_sysfb_get_visible_size_si(). Drivers use this size value for
reserving access to framebuffer memory. Increasing the value can
make later attempts to do so fail.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 32ae90c66fb6 ("drm/sysfb: Add efidrm for EFI displays")
Fixes: a84eb6abe2b6 ("drm/sysfb: Add vesadrm for VESA displays")
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.16+
Link: https://patch.msgid.link/20260618084327.46567-2-tzimmermann@suse.de
|
|
drm_parse_tiled_block() casts the DisplayID block to a
struct displayid_tiled_block and reads the full fixed layout up to
tile->topology_id[7] without checking block->num_bytes. The DisplayID
iterator only validates the declared payload length, so a crafted EDID
can advertise a tiled-display block (tag DATA_BLOCK_TILED_DISPLAY, or
DATA_BLOCK_2_TILED_DISPLAY_TOPOLOGY for v2.0) with a small num_bytes at
the end of a DisplayID extension. The read then runs past the end of the
exact-sized kmemdup()'d EDID allocation, a heap out-of-bounds read.
Reject blocks shorter than the spec's 22-byte tiled payload before
reading the fixed struct, as drm_parse_vesa_mso_data() already does.
BUG: KASAN: slab-out-of-bounds in drm_edid_connector_update
Read of size 2 at addr ffff888010077700 by task exploit/147
dump_stack_lvl (lib/dump_stack.c:94 ...)
print_report (mm/kasan/report.c:378 ...)
kasan_report (mm/kasan/report.c:595)
drm_edid_connector_update (drivers/gpu/drm/drm_edid.c:7581)
bochs_connector_helper_get_modes (drivers/gpu/drm/tiny/bochs.c:574)
drm_helper_probe_single_connector_modes (drivers/gpu/drm/drm_probe_helper.c:426)
status_store (drivers/gpu/drm/drm_sysfs.c:219)
...
vfs_write (fs/read_write.c:595 fs/read_write.c:688)
ksys_write (fs/read_write.c:740)
Fixes: 40d9b043a89e ("drm/connector: store tile information from displayid (v3)")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/20260615184737.899892-1-xmei5@asu.edu
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-7.2-2026-06-19:
amdgpu:
- devcoredump fixes
- SMU15 fix
- Various irq put/get imbalance cleanup fixes
- 8K panel fix
- DCN3.5 fix
- lockdep fix
- Cleaner shader sysfs IB overflow fix
- Async flip fixes
- GET_MAPPING_INFO fix
- CP_GFX_SHADOW fix
- Ctx pstate handling fix
- GTT bo move handling fixes
- Old UVD BO placement fixes
- GC9 mode2 reset fix
- IH6.1 version fix
- Soft IH ring fix
amdkfd:
- Fix doorbell/mmio double unpin on free
- CRIU fixes
- SMI event fixes
- Sysfs teardown fix
- Various boundary checking fixes
- Various error checking fixes
- SVM fix
radeon:
- r100_copy_blit fix for large BOs
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260619152610.776982-1-alexander.deucher@amd.com
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
- Set TTM beneficial order to 9 in Xe
- Several error path cleanups
- Fix TDR for unstarted jobs on kernel queues
- Several TLB invalidation fixes related to suspending LR queues
- Some small RAS fixes
- Multi-queue suspend fix for LR queues
- Revert inclusion of NVL_S firmware
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/ajLy2brwvOZEFNNN@gsse-cloud1.jf.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "taskstats: fix TGID dead-thread stat retention" (Yiyang Chen)
Fix a taskstats TGID aggregation bug where fields added in the TGID
query path were not preserved after thread exit, and adds a kselftest
covering the regression.
- "lib/tests: string_helpers: Slight improvements" (Andy Shevchenko)
Improve lib/tests/string_helpers_kunit.c a little
- "lib/base64: decode fixes" (Josh Law)
Address minor issues in lib/base64.c
- "selftests/filelock: Make output more kselftestish" (Mark Brown)
Make the output from the ofdlocks test a bit easier for tooling to
work with. Also ignore the generated file
- "uaccess: unify inline vs outline copy_{from,to}_user() selection"
(Yury Norov)
Simplify the usercopy code by removing the selectability of inlining
copy_{from,to}_user().
- "ocfs2: validate inline xattr header consumers" (ZhengYuan Huang)
Fix a number of possible issues in the ocfs2 xattr code
- "lib and lib/cmdline enhancements" (Dmitry Antipov)
Provide additional robustness checking in the cmdline handling code
and its in-kernel testing and selftests
- "cleanup the RAID6 P/Q library" (Christoph Hellwig)
Clean up the RAID6 P/Q library to match the recent updates to the
RAID 5 XOR library and other CRC/crypto libraries
- "ocfs2: harden inode validators against forged metadata" (Michael
Bommarito)
Add three structural checks to OCFS2 dinode validation so malformed
on-disk fields are rejected before ocfs2_populate_inode() copies them
into the in-core inode
- "lib/raid: replace __get_free_pages() call with kmalloc()" (Mike
Rapoport)
Clean up the lib/raid code by using kmalloc() in more places
* tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (108 commits)
ocfs2: fix circular locking dependency in ocfs2_dio_end_io_write
ocfs2: fix NULL h_transaction deref in ocfs2_assure_trans_credits
lib: interval_tree_test: validate benchmark parameters
ocfs2: avoid moving extents to occupied clusters
treewide: fix transposed "sign" typos and update spelling.txt
ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec
fat: reject BPB volumes whose data area starts beyond total sectors
selftests/uevent: increase __UEVENT_BUFFER_SIZE to avoid ENOBUFS on busy systems
lib/test_firmware: allocate the configured into_buf size
fs: efs: remove unneeded debug prints
checkpatch: cuppress warnings when Reported-by: is followed by Link:
MAINTAINERS: add Alexander as a kcov reviewer
mailmap: update Alexander Sverdlin's Email addresses
fs: fat: inode: replace sprintf() with scnprintf()
ocfs2: fix out-of-bounds write in ocfs2_remove_refcount_extent
ocfs2: fix race between ocfs2_control_install_private() and ocfs2_control_release()
ocfs2/dlm: require a ref for locking_state debugfs open
ocfs2: reject FITRIM ranges shorter than a cluster
ocfs2: validate fast symlink target during inode read
ocfs2: add journal NULL check in ocfs2_checkpoint_inode()
...
|
|
nouveau_uvmm_vm_bind_ucopy() and nouveau_exec_ucopy() place their error
cleanup labels in allocation order rather than reverse allocation order.
On a u_memcpya() failure for in_sync.s, the goto to err_free_ops (or
err_free_pushs) frees the first allocation and then falls through to
err_free_ins, which calls u_free() on args->in_sync.s.
Since args->in_sync.s still holds the ERR_PTR returned by the failed
u_memcpya(), and ERR_PTR values are not caught by ZERO_OR_NULL_PTR(),
kvfree() proceeds to dereference it, which can result in a kernel oops.
A failure for out_sync.s instead jumps to err_free_ins and skips freeing
the first allocation, leading to a memory leak.
Fix by swapping the cleanup label order so resources are freed in the
correct reverse allocation sequence.
Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/SYBPR01MB7881484D91A6F80271415F71AF1A2@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
In nvkm_acr_oneinit(), nvkm_kmap(acr->wpr) is invoked unconditionally
at line 309 to obtain a mapping reference. Additionally, when both
acr->wpr_fw and acr->wpr_comp are present, a second nvkm_kmap() is
called inside the conditional block. Both mappings are expected to be
released by nvkm_done(acr->wpr) at line 320 before the function returns
successfully.
However, when a mismatch is detected during the loop within the
conditional block, the function returns -EINVAL at line 318 without
calling nvkm_done(). This results in a leak of the kmap reference(s)
acquired earlier.
Fix the issue by invoking nvkm_done(acr->wpr) prior to the early return
to ensure proper release of the mapping references.
Fixes: 22dcda45a3d1 ("drm/nouveau/acr: implement new subdev to replace "secure boot"")
Cc: stable@vger.kernel.org
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20260606155606.77593-1-vulab@iscas.ac.cn
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "selftests/mm: clean up build output and verbosity" (Li Wang)
Remove some noise from the MM selftests build
- "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts)
Speed up the freeing of a batch of 0-order pages by first scanning
them for coalescing opportunities. This is applicable to vfree() and
to the releasing of frozen pages
- "mm/damon: introduce DAMOS failed region quota charge ratio"
(SeongJae Park)
Address a DAMOS usability issue: The DAMOS quota often exhausts
prematurely because it charges for all memory attempted, causing slow
and inconsistent performance when actions fail on unreclaimable
memory.
To fix this, a new feature lets users set a smaller, flexible quota
charge ratio (via a numerator and denominator) for failed regions.
Since failed actions cause less overhead, reducing their quota cost
ensures more predictable and efficient DAMOS processing
- "selftests/cgroup: improve zswap tests robustness and support large
page sizes" (Li Wang)
Fix various spurious failures and improves the overall robustness of
the cgroup zswap selftests
- "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga)
Fix an issue in the mlock selftests on arm32
- "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao)
Some maintenance work in the huge_memory code
- "treewide: fixup gfp_t printks" (Brendan Jackman)
Use the special vprintf() gfp_t conversion in various places
- "mm: Fix vmemmap optimization accounting and initialization" (Muchun
Song)
Fix several bugs in the vmemmap optimization, mainly around incorrect
page accounting and memmap initialization in the DAX and memory
hotplug paths. It also fixes pageblock migratetype initialization and
struct page initialization for ZONE_DEVICE compound pages
- "mm/damon: repost non-hotfix reviewed patches in damon/next tree"
A sprinkle of unrelated minor bugfixes for DAMON
- "mm: remove page_mapped()" (David Hildenbrand)
Remove this function from the tree, replacing it with folio_mapped()
- "mm/damon: let DAMON be paused and resumed" (SeongJae Park)
Allow DAMON to be paused and resumed without losing its current state
- "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad
Usama Anjum)
Simplify and speed up kasan by removing its ineffective tagging of
stacks and page tables
- "mm/damon/reclaim,lru_sort: monitor all system rams by default"
(SeongJae Park)
Simplify deployment on diverse hardware like NUMA systems by updating
DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the
physical address range covering all System RAM areas by default,
replacing the overly restrictive behavior that only targeted the
single largest memory block to save on negligible overhead
- "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae
Park)
Update some DAMON docs
- "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin)
Switch zone->lock handling over to using the guard() mechanisms
- "mm/filemap: tighten mmap_miss hit accounting" (fujunjie)
Fix a flaw where the mmap_miss counter over-credited page cache hits
during fault-arounds and page-fault retries. This results in
significant reduction of redundant synchronous mmap readahead I/O,
drastically cutting down execution time and gigabytes read for sparse
random or strided memory access workloads
- "selftests/cgroup: Fix false positive failures in test_percpu_basic"
(Li Wang)
Fix a couple of false-positives in the cgroup kmem selftests
- "mm/damon/reclaim: support monitoring intervals auto-tuning"
(SeongJae Park)
Add a new parameter to DAMON permitting DAMON_RECLAIM to
automatically tune DAMON's sampling and aggregation intervals
- "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park)
Change DAMON_STAT to provide the pid of its kdamond
- "mm/kmemleak: dedupe verbose scan output" (Breno Leitao)
Remove large amounts of duplicated backtraces from the verbose-mode
kmemleak output
- "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David
Hildenbrand)
Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to
removing it entirely in a later series
- "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan)
Prevent users from passing a non-power-of-2 value of `addr_unit', as
this later results in undesirable behavior
- "mm: document read_pages and simplify usage" (Frederick Mayle)
- "tools/mm/page-types: Fix misc bugs" (Ye Liu)
Fix three issues in tools/mm/page-types.c
- "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman)
Implement several cleanups in the page allocator and related code
- "mm, swap: swap table phase IV: unify allocation" (Kairui Song)
Unify the allocation and charging of anon and shmem swap in folios,
provides better synchronization, consolidates the metadata
management, hence dropping the static array and map, and improves
performance
- "mm/damon: introduce data attributes monitoring" (SeongJae Park(
Extend DAMON to monitor general data attributes other than accesses
- "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra)
Implement the TODO in vrealloc() to unmap and free unused pages when
shrinking across a page boundary
- "mm/damon: documentation and comment fixes" (niecheng)
- "remove mmap_action success, error hooks" (Lorenzo Stoakes)
Eliminate custom hooks from mmap_action by removing the problematic
success_hook which allowed drivers to improperly access uninitialized
VMAs. It replaces the error_hook with a simple error-code field and
updates the memory char driver accordingly
- "mm/damon: minor improvements for code readability and tests"
(SeongJae Park)
- "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym
Shcherba)
- "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike
Rapoport)
- "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and
others)
Clean up and slightly improves MGLRU's reclaim loop and dirty
writeback handling. Large performance improvements are measured
- "use vma locks for proc/pid/{smaps|numa_maps} reads" (Suren
Baghdasaryan)
Use per-vma locks when reading /proc/pid/smaps and numa_maps similar
to reduce contention on central mmap_lock
- "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()"
(Ran Xiaokai)
Some cleanup work in the THP code
- "selftests/memfd: fix compilation warnings" (Konstantin Khorenko)
Fix a few build glitches in the memfd selftest code.
- "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel
Butt)
Resolve a 68% performance regression caused by NUMA-node cache
thrashing around struct obj_stock_pcp by shrinking its existing
fields and expanding it into a multi-slot array that caches up to
five obj_cgroup pointers per CPU, allowing per-node variants of the
same memcg to coexist within a single 64-byte cache line.
- "zram: writeback fixes" (Sergey Senozhatsky)
address a couple of unrelated zram writeback issues
- "mm: switch THP shrinker to list_lru" (Johannes Weiner)
Resolve NUMA-awareness issues and streamlines callsite interaction by
refactoring and extending the list_lru API to completely replace the
complex, open-coded deferred split queue for Transparent Huge Pages
- "mm: improve large folio readahead for exec memory" (Usama Arif)
Improve large-folio readahead on systems like 64K-page arm64 by
preventing the mmap_miss check from permanently disabling
target-oriented VM_EXEC readahead, and by generalizing the
force_thp_readahead gate to support mappings with any usefully large
maximum folio order under the cache cap.
- "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau)
Fix a bunch of minor issues in the userfaultfd/pagemap, all of which
were flagged by Sashiko review of proposed new material
- "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and
vmemmap_check_pmd()" (Muchun Song)
Provide generic versions of these two functions so the four
arch-specific implementations can be removed.
- "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device" (Youngjun Park)
Address a uswsusp-vs-swapoff race and reduces the swap device
reference taking/releasing frequency.
- "mm/hmm: A fix and a selftest" (Dev Jain)
* tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries
fs/proc/task_mmu: do not warn on seeing non-migration pmd entry
lib/test_hmm: check alloc_page_vma() return value and handle OOM
mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX
mm/swap: remove redundant swap device reference in alloc/free
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
mm/filemap: use folio_next_index() for start
vmalloc: fix NULL pointer dereference in is_vm_area_hugepages()
sparc/mm: drop vmemmap_check_pmd helper and use generic code
loongarch/mm: drop vmemmap_check_pmd helper and use generic code
riscv/mm: drop vmemmap_pmd helpers and use generic code
arm64/mm: drop vmemmap_pmd helpers and use generic code
mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd()
rust: page: mark Page::nid as inline
userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks
userfaultfd: gate must_wait writability check on pte_present()
mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade
fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole()
fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry()
fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- v4l2:
- core: fix subdev sensor ownership
- subdev: Allow accessing routes with STREAMS client capability
- ctrls: Add validation for HEVC active reference counts and
background detection control
- common: Add YUV24 format info and has_alpha helper
- vb2: Change vb2_read() and vb2_write() return types to ssize_t
- i2c: cvs: Add driver of Intel Computer Vision Sensing Controller(CVS)
- atmel-isc: remove deprecated driver
- cec: Add CEC Latency Indication Protocol (LIP) support
- imon: Add iMON VFD HID OEM v1.2 key mappings
- AVMatrix: new HWS capture driver
- isp4: new AMD capture driver
- qcom:
- iris: Add hierarchical coding, B-frame, and Long-Term Reference
support for encoder
- camss: Add SM6350 platform support
- venus: Add SM6115 platform support
- chips-media: wave5: Add support for Packed YUV422, CBP profile, and
background detection
- csi2rx: Add multistream support and 32 dma chans
- Several cleanups and fixes
* tag 'media/v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (394 commits)
media: v4l2-fwnode: Fix subdev owner overwritten in v4l2_async_register_subdev_sensor()
media: qcom: iris: vdec: allow GEN2 decoding into 10bit format
media: qcom: iris: vdec: update find_format to handle 8bit and 10bit formats
media: qcom: iris: vdec: update size and stride calculations for 10bit formats
media: qcom: iris: gen2: add support for 10bit decoding
media: qcom: iris: add QC10C & P010 buffer size calculations
media: qcom: iris: add helpers for 8bit and 10bit formats
media: qcom: iris: Fix FPS calculation and VPP FW overhead
media: qcom: camss: vfe-340: Support for PIX client
media: qcom: camss: vfe-340: Proper client handling
media: qcom: camss: csid-340: Enable PIX interface routing
media: qcom: camss: csid-340: Add port-to-interface mapping
media: qcom: camss: csid-340: Switch to generic CSID_CFG/CTRL registers
media: iris: Initialize HFI ops after firmware load in core init
media: iris: drop struct iris_fmt
media: iris: Add platform data for X1P42100
media: iris: Add hardware power on/off ops for X1P42100
media: iris: optimize COMV buffer allocation for VPU3x and VPU4x
media: iris: add FPS calculation and VPP FW overhead in frequency formula
media: qcom: iris: Simplify COMV size calculation
...
|
|
Several comments transpose the letters in "assigned" and "unsigned",
spelling them with "sing" instead of "sign". Correct all of them.
Of these, the misspelling of "assigned" is not yet flagged by checkpatch,
so also add it to scripts/spelling.txt.
The remaining matches of `grep -ri singed` are RISINGEDGE register and
enum names, not typos.
Link: https://lore.kernel.org/20260612181633.734458-1-iamsharduld@gmail.com
Signed-off-by: Shardul Deshpande <iamsharduld@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Allow the kernel to dispatch the soft IH work on other CPUs.
Otherwise it can happen that the soft IH ring fills up
before it actually starts processing anything, which
can easily happen with retry page faults, in which case
the CP repeatedly spams the CPU with a lot of interrupts.
This significantly improves retry page fault handling on
GPUs that don't have the filter CAM and must rely on
software based filtering.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3cdff3c8b93c2834977224d9c2b201fc334dd184)
|
|
Report the correct version of IH v6.1 (previously it showed v6.0).
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 940d33ebbcdebaf095fade86e9c981ad8789aee2)
|
|
SVM ranges use inclusive page indices: prange->last is the last page in
the range. The split-remap logic introduced by commit 448ee45353ef
("drm/amdkfd: Use huge page size to check split svm range alignment")
uses ALIGN_DOWN(prange->last, 512) to determine whether the original
range can contain a 2MB huge-page mapping.
That aligns the last page itself down. Thus a range ending one page
before the next 2MB boundary is classified as if the final 2MB block did
not exist. When such a range is split inside that final block, the
split head or tail can be left off the remap list even though it was
derived from an original range that may have PMD mappings.
Use prange->last + 1 as the exclusive upper bound when computing the
original range's last 2MB-aligned boundary. Then use the actual split
boundary for the head and tail alignment checks: tail->start for a tail
split, and new_start for a head split. new_start is equivalent to
head->last + 1 and directly names the exclusive end of the split head.
Using head->last for the head-side check can both remap a head that ends
exactly one page before a 2MB boundary and miss a head whose split
boundary is one page after such a boundary. Philip Yang pointed out in
the review of the original change that this condition should use
head->last + 1 or new_start.
Xiaogang Chen identified the inclusive-last cause and posted the
candidate fix in the regression thread. With the culprit change active
and the local revert not applied, the unchanged C/HSA reproducer
completes 10/10 runs with this change on an RX 7600 XT.
Fixes: 448ee45353ef ("drm/amdkfd: Use huge page size to check split svm range alignment")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4914
Link: https://lore.kernel.org/stable/IA1PR12MB85172F7FE9157C092EDA46A0E3112@IA1PR12MB8517.namprd12.prod.outlook.com/
Link: https://lore.kernel.org/all/32ce2b72-aa16-4202-9f99-92e3cd4408bc@amd.com/
Suggested-by: Xiaogang Chen <xiaogang.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Gerhard Schwanzer <geschw@pm.me>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a60ea15807126b148a328051636977a33ad0e9bb)
Cc: stable@vger.kernel.org
|
|
For Renior APU with gfx9, in some test scenarios with disabling
ring_reset, like accessing an unmapped invalid address, it can
trigger a gpu job timeout event, then driver uses Mode2 reset
to reset GPU, but after Mode2 compute Ring test and IB test fail
randomly. It because the HQDs of MECs are always active before or
after Mode2, that causes MECs use stale HQDs when MECs are unhalted
before driver restore MQDs, and causes CPC and CPF are still stuck
after Mode2, then causes compute Ring and IB tests fail.
So, add sequences to deactivate HQDs of MECs in suspend IP function
of the resetting process.
v2: Move all sequences into a new function gfx_v9_0_cp_mode2_clear_state (Ray Huang)
To check reset Mode2 method in the if condition (Ray Huang)
v3: Move all sequences before Mode2 instead of after Mode2 (Timur Kristóf)
v4: Call amdgpu_gfx_rlc_enter/exit_safe_mode int the begin and end of
gfx_v9_0_deactivate_kcq_hqd (Alex Deucher)
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c3988a7ad4799514447294f04f063b422e0551df)
Cc: stable@vger.kernel.org
|
|
(v2)
UVD 4.x and older can only access MSG, FEEDBACK buffers from a
specific 256M VRAM segment that the VCPU BO is also located in.
We already modify all placements of the given BO to ensure
the BO is placed within this segment.
Previously, it always assumed that the VCPU segment is
the first 256M of VRAM, even though under some conditions
the VCPU BO could be allocated outside this segment,
which made UVD non-functional as the BOs were
not inside the same segment as the UVD VCPU BO.
Solve that by using the segment where the VCPU BO actually is.
This fixes an issue with UVD failing to initialize on SI/CIK
when resizable BAR is enabled and the VCPU BO is allocated
in a different segment.
v2:
- For other BOs, keep using the same UVD segment as before.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3851
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cbfd4d3fc2061a1ec8e9d36e65973ac3e813358a)
Cc: stable@vger.kernel.org
|
|
These UVD versions don't fully support GPUVM and are only
validated to work when their VCPU BO is placed in VRAM.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 01b8dfc0660db5d6cdd62c22dc20f774a26ce853)
Cc: stable@vger.kernel.org
|
|
The UVD code relies on GTT to GTT moves in order to ensure
that its BOs don't cross 256M segments.
Fixes: bfe5e585b44f ("drm/ttm: move last binding into the drivers.")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 21fd45e5e2628d00b478590bcc3d14d3de5d45b6)
Cc: stable@vger.kernel.org
|
|
When testing intersection and compatibility, respect
the actual placement requirements. This is a pre-requisite
for ensuring that UVD CS BOs do not cross 256M segments.
Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible functions")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc06579ca29dee9c245a41b12e39c7bb6938af5d)
Cc: stable@vger.kernel.org
|
|
There are several problems in the context pstate handling code.
The most serious ones are potential use-after-free and NULL pointer
dereferences at context initialization time. Both are due
amdgpu_ctx_init() not holding the adev->pm.stable_pstate_ctx_lock, which
is otherwise used from both sysfs and the context code itself for
modifying and clearing the stored context pointer.
Second issue is that context fini can trample over the pstate
configuration set via sysfs. This is due the restore state
(ctx->stable_pstate) being saved at context init time, and not if, or when
the context actually changes the pstate. As the context exits it will
therefore incorrectly restore to what was set before the sysfs override
was requested.
The simplest fix is to drastically simplify how the state is tracked, by
clearly defining the points at which pstate ownership is taken and
released, and to handle all transitions under the correct lock.
Instead of at context init time, the previous state is saved only at the
point the context overrides the current state, and is restored on context
exit only if the context is still the owner of the current override state.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 79610d304133 ("drm/amdgpu: fix pstate setting issue")
Cc: Chengming Gui <Jack.Gui@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b5e413713c0a93bc1818394d0ce49aaad21bd27)
Cc: <stable@vger.kernel.org> # v6.1+
|
|
Several kfd ioctls need transfer array data from/to user space. Kfd driver
uses kmalloc_array with user provided size. That can oversize alloc or 32-bit
wrap with hostile value. Replace it by memdup_array_user that does overflow
checking and allocates through dedicated slab caches, also physical continuous
as kmalloc.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4eca4742eb215951f9739ffe0122d179d545a7a4)
|
|
If inx from find_first_zero_bit is beyond range not need set doorbell_bitmap.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2664ce9143d174651a793d96a6a2326050c4f45a)
|
|
amdkfd driver needs allocate buffer to return bo metadata to user space. The
buffer size is controlled by user currently. It is a potential security issue
that hostile value (e.g. 2 GiB) lets any render-group user trigger order-MAX
allocation/OOM in kernel context.
This patch first finds bo metadata size. If the size is smaller than user
provided value drive can safely allocate buffer in kernel space and copy to
user space buffer. If not, driver will let user know, not allocate and copy.
User will redo with new buffer in user space.
This patch lets driver decide buffer allocation size to avoid potential hostile
size from user space.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f54ce9e8cbd3abe0eda3a285f54dc4f572fe589a)
|
|
When dumping IB contents from a hung job, amdgpu_devcoredump_format()
acquired the VM root PD's reservation via amdgpu_vm_lock_by_pasid() and
then, for each IB, called amdgpu_bo_reserve() on the BO backing the IB.
Both reservations are reservation_ww_class_mutex objects and neither
used a ww_acquire_ctx, which trips lockdep:
WARNING: possible recursive locking detected
--------------------------------------------
kworker/u128:0 is trying to acquire lock:
ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
but task is already holding lock:
ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
Possible unsafe locking scenario:
CPU0
----
lock(reservation_ww_class_mutex);
lock(reservation_ww_class_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
Call Trace:
__ww_mutex_lock.constprop.0
ww_mutex_lock
amdgpu_bo_reserve
amdgpu_devcoredump_format+0x1594 [amdgpu]
amdgpu_devcoredump_deferred_work+0xea [amdgpu]
The two reservations are on different BOs in the captured trace, so the
splat is a lockdep-correctness warning, not an observed deadlock. It
becomes a real self-deadlock whenever the IB BO shares its dma_resv with
the root PD (the always-valid case, see amdgpu_vm_is_bo_always_valid()):
amdgpu_bo_reserve(abo) re-acquires the same ww_mutex without a ticket
and blocks forever. With amdgpu.gpu_recovery=0 the timeout handler
refires every ~2 s and each invocation produces this splat, drowning the
kernel ring buffer.
Now that amdgpu_vm_lock_by_pasid() takes a drm_exec context, move the IB
dumping into a separate helper that locks the root PD and every IB BO
together in a single drm_exec ticket. DRM_EXEC_IGNORE_DUPLICATES handles
IB BOs that share a dma_resv (e.g. always-valid BOs, or two IBs backed
by the same BO). Every lock is now a top-level acquire under one
ww_acquire_ctx, so the recursive ww_mutex condition is gone, and the
per-IB amdgpu_bo_reserve()/amdgpu_bo_unref() dance -- including a BO
refcount leak on the amdgpu_bo_reserve() failure path -- is removed.
Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d6bf4242731219ee08ce54c365631e395486651e)
|
|
amdgpu_vm_lock_by_pasid() looks up a VM by PASID and reserves its root
PD with a bare amdgpu_bo_reserve(), returning the still-reserved root to
the caller. A caller that then needs to reserve further BOs (for example
the devcoredump IB dump) ends up nesting reservation_ww_class_mutex
acquires without a ww_acquire_ctx, which lockdep flags as recursive
locking.
Convert the helper to take a drm_exec context and lock the root PD with
drm_exec_lock_obj(). Callers now run it inside a
drm_exec_until_all_locked() loop and can lock additional BOs in the same
ww ticket, so there is no nested ww_mutex acquire.
The drm_exec context holds its own reference on the locked root BO, so
the helper no longer hands a root reference back to the caller: the
root output parameter is dropped, and the transient reference taken
across the PASID lookup is released before returning.
The only existing caller, amdgpu_vm_handle_fault(), is updated
accordingly. Its is_compute_context path, which previously dropped the
root reservation around svm_range_restore_pages() and re-took it, now
finalises the drm_exec context and re-initialises a fresh one; behaviour
is otherwise unchanged.
No functional change intended for the page-fault path.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 14682de8ad377bf13ea66e47c26dcfea0b19a21d)
|
|
UTS_RELEASE evaluates to a static string and changes quite easily (e.g.
uncommitted changes in the source tree or new commits). So when checking
if a patch introduces changes to the resulting binary each usage of
UTS_RELEASE is source of annoyance.
Instead of using UTS_RELEASE directly use init_utsname()->release which
evaluates to the same string but with that a change of UTS_RELEASE
doesn't affect amdgpu_dev_coredump.o.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260428144704.1114562-2-u.kleine-koenig@baylibre.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d785df5598fd1d1cc2f2f45c05448271b6d490b7)
|
|
Move kfd_process_remove_sysfs() earlier in kfd_process_wq_release() so
that all sysfs/procfs entries are removed before tearing down PDDs and
dropping lead_thread. The per-process sysfs attributes are backed by
struct kfd_process_device, and their show/store callbacks dereference
PDD fields. Since sysfs removal waits for active callbacks to complete,
removing these entries first closes a race where userspace reads sdma_*
and stats_* files after PDD teardown.
Previously this cleanup ran after kfd_process_destroy_pdds(), which
resets p->n_pdds to 0. This meant kfd_process_remove_sysfs() could no
longer walk the PDD array, so the per-PDD sysfs cleanup did not run as
intended.
This race caused NULL pointer dereferences observed in
kfd_sdma_activity_worker and kfd_procfs_stats_show.
Also harden kfd_process_remove_sysfs() against partially
initialized or already-freed objects:
- Check kobj_queues before removing PASID and deleting it
- Guard kobj_stats and kobj_counters before use
These checks prevent invalid dereferences during cleanup.
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 674c692702341fed321720b4b92036c5934fb485)
|
|
Add a minimum-length check for the AMDGPU_CHUNK_ID_CP_GFX_SHADOW chunk in
amdgpu_cs_pass1(), matching the gate already present for the IB, FENCE and
BO_HANDLES chunk types.
The CP_GFX_SHADOW case previously shared a bare break with the dependency
and syncobj chunk types, which do not dereference a fixed-size struct. When
userspace submits this chunk with length_dw == 0, vmemdup_array_user() is
called with size 0 and returns ZERO_SIZE_PTR, which passes the IS_ERR()
check. amdgpu_cs_p2_shadow() then dereferences chunk->kdata as a struct
drm_amdgpu_cs_chunk_cp_gfx_shadow (reading shadow->flags), faulting on the
ZERO_SIZE_PTR and causing a NULL-pointer dereference.
This is reachable by an unprivileged process in the render group. Reject
undersized chunks with -EINVAL during pass1 so the bad submission is
rejected before pass2 ever dereferences the data.
Fixes: ac9287055ff1 ("drm/amdgpu: add gfx shadow CS IOCTL support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7f61b2eef7415eccdb40850aca0de94211948657)
Cc: stable@vger.kernel.org
|
|
The AMDGPU_GEM_OP_GET_MAPPING_INFO path of amdgpu_gem_op_ioctl() looks
up the bo_va for the buffer object in the caller's VM via
amdgpu_vm_bo_find(), but uses the returned pointer without checking it.
amdgpu_vm_bo_find() returns NULL when the BO has no bo_va in that VM,
which is the normal case for a BO that has never been mapped. The result
is fed straight into amdgpu_vm_bo_va_for_each_valid_mapping(), which
expands to list_for_each_entry(mapping, &(bo_va)->valids, list) and
dereferences bo_va, causing a NULL pointer dereference.
This is reachable by any process able to issue the ioctl (render group)
simply by requesting mapping info for an unmapped BO.
Return -ENOENT when no bo_va is found, jumping to out_exec so the
drm_exec context and GEM object reference are released.
Fixes: 4d82724f7f2b ("drm/amdgpu: Add mapping info option for GEM_OP ioctl")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 528b19377affc1cc7362a70a254c1dda793595f9)
Cc: stable@vger.kernel.org
|
|
If there is an early failure during amdgpu probe, like missing firmware, it
will end up calling amdgpu_irq_disable_all, which takes irq.lock spinlock
without it being initialized.
Initializing irq.lock earlier at amdgpu_device_init fixes the issue.
[ 79.334079] INFO: trying to register non-static key.
[ 79.334081] The code is fine but needs lockdep annotation, or maybe
[ 79.334083] you didn't initialize this object before use?
[ 79.334084] turning off the locking correctness validator.
[ 79.334088] CPU: 2 UID: 0 PID: 1819 Comm: bash Not tainted 7.1.0-rc5-gfd06300b2348 #96 PREEMPT 8e8f461221633dae3c832d6689eaf0546c0ed4cd
[ 79.334092] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0133 08/05/2024
[ 79.334094] Call Trace:
[ 79.334095] <TASK>
[ 79.334097] dump_stack_lvl+0x5d/0x80
[ 79.334103] register_lock_class+0x7af/0x7c0
[ 79.334109] __lock_acquire+0x416/0x2610
[ 79.334114] lock_acquire+0xcf/0x310
[ 79.334117] ? amdgpu_irq_disable_all+0x3b/0xf0 [amdgpu c88bab43d391d519ad0d5c8e5a099b4aceefa180]
[ 79.334503] ? _raw_spin_lock_irqsave+0x53/0x60
[ 79.334508] _raw_spin_lock_irqsave+0x3f/0x60
[ 79.334510] ? amdgpu_irq_disable_all+0x3b/0xf0 [amdgpu c88bab43d391d519ad0d5c8e5a099b4aceefa180]
[ 79.334881] amdgpu_irq_disable_all+0x3b/0xf0 [amdgpu c88bab43d391d519ad0d5c8e5a099b4aceefa180]
[ 79.335240] amdgpu_device_fini_hw+0x90/0x32c [amdgpu c88bab43d391d519ad0d5c8e5a099b4aceefa180]
[ 79.335704] amdgpu_driver_load_kms.cold+0x22/0x44 [amdgpu c88bab43d391d519ad0d5c8e5a099b4aceefa180]
[ 79.336159] amdgpu_pci_probe+0x204/0x440 [amdgpu c88bab43d391d519ad0d5c8e5a099b4aceefa180]
[ 79.336494] local_pci_probe+0x3c/0x80
[ 79.336500] pci_call_probe+0x55/0x2e0
[ 79.336505] ? _raw_spin_unlock+0x2d/0x50
[ 79.336508] ? pci_match_device+0x157/0x180
[ 79.336512] pci_device_probe+0x9b/0x170
[ 79.336516] really_probe+0xd5/0x370
[ 79.336521] __driver_probe_device+0x84/0x150
[ 79.336525] device_driver_attach+0x47/0xb0
[ 79.336528] bind_store+0x73/0xc0
[ 79.336531] kernfs_fop_write_iter+0x176/0x250
[ 79.336536] vfs_write+0x24d/0x560
[ 79.336542] ksys_write+0x71/0xe0
[ 79.336546] do_syscall_64+0x122/0x710
[ 79.336550] ? do_syscall_64+0xd1/0x710
[ 79.336553] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 79.336557] RIP: 0033:0x7f92fd675006
[ 79.336561] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
[ 79.336562] RSP: 002b:00007ffe4fa867a0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 79.336565] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f92fd675006
[ 79.336567] RDX: 000000000000000d RSI: 000055b2dfce59b0 RDI: 0000000000000001
[ 79.336568] RBP: 00007ffe4fa867c0 R08: 0000000000000000 R09: 0000000000000000
[ 79.336569] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000d
[ 79.336570] R13: 000055b2dfce59b0 R14: 00007f92fd7ca5c0 R15: 000055b2dfdbaf70
[ 79.336574] </TASK>
Fixes: 9950cda2a018 ("drm/amdgpu: drop the drm irq pre/post/un install callbacks")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7dba3e10ecdeec85208e255853fcd3890880b10e)
|
|
The cleanup tail of kfd_criu_resume_svm() walks
svms->criu_svm_metadata_list and kfree()s each struct criu_svm_metadata
without removing it from the list. The list head is left pointing at
freed kmalloc-96 objects.
A second AMDKFD_IOC_CRIU_OP from the same process re-enters: list_empty()
reads the dangling ->next (use-after-free), the loop walks freed entries,
and each is kfree()'d again (double-free). This is reachable by an
unprivileged render-group user via /dev/kfd with no capabilities required.
Add list_del() before the kfree() so the list is properly emptied. The
list_for_each_entry_safe() iterator already caches the next pointer, so
unlinking during the walk is safe.
Fixes: 2a909ae71871 ("drm/amdkfd: CRIU resume shared virtual memory ranges")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6322d278a298e2c1430b9d2697743d3a04b788b1)
|
|
r100_copy_blit() copies BOs as 1024-pixel-wide ARGB8888 blits, so one
GPU page becomes one blit row. Large copies are split into chunks of at
most 8191 rows.
The kernel register header names the packet coordinate dwords SRC_Y_X
and DST_Y_X. In the BITBLT_MULTI description in
R5xx_Acceleration_v1.5.pdf docs, these correspond to [SRC_X1 | SRC_Y1]
and [DST_X1 | DST_Y1], which are signed 13-bit coordinates in the
-8192..8191 range. The old code kept SRC/DST_PITCH_OFFSET at the BO base
and used SRC_Y_X/DST_Y_X as the chunk address, so large BO moves could
exceed that coordinate range.
Compute per-chunk SRC/DST_PITCH_OFFSET bases and emit zero source and
destination coordinates. r100_copy_blit() already packs
SRC/DST_PITCH_OFFSET as pitch plus base offset, so large chunk addresses
belong there rather than in the coordinate fields.
This fixes Prison Architect corruption with 4096x4096 mipped textures
after they are evicted to GTT under memory pressure on RV530.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/6716
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 87be26aee76239c6da03e599f238a426897f78ad)
Cc: stable@vger.kernel.org
|
|
[Why]
amdgpu_dm_crtc_mem_type_changed() fetches the "old" and "new" plane state
with two drm_atomic_get_plane_state() calls, which both return the new
state. It compares a state against itself, so it never detects a mem_type
change and never rejects the async flip.
On DCN 3.0.1, this shows up as intermittent corruption when a single DCC
plane is scanned out with immediate flips under gamescope and its buffer
moves between the VRAM carveout and GTT.
[How]
Use drm_atomic_get_old_plane_state() and drm_atomic_get_new_plane_state()
to compare the actual old and new states. These return NULL rather than
an error pointer for a plane that is not part of the commit, so the
IS_ERR() check becomes a NULL check that skips those planes, such as an
unmodified cursor still in the CRTC's plane_mask.
Fixes: 4caacd1671b7 ("drm/amd/display: Do not elevate mem_type change to full update")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 13158e5dbd896281f3e9982b5437cffa5fd621b2)
|
|
[Why]
The DRM core exposes an IN_FORMATS_ASYNC plane property describing the
set of format/modifier pairs that are valid for asynchronous (immediate)
page flips. amdgpu already advertises async page flip support via
mode_config.async_page_flip = true, but never implemented the
.format_mod_supported_async plane callback, so the IN_FORMATS_ASYNC
property was not created.
This inconsistency (advertising async flips while exposing IN_FORMATS but
no IN_FORMATS_ASYNC) causes userspace, such as igt-gpu-tools, to emit a
repeated warning during plane initialization, which in turn demotes many
otherwise passing KMS subtests to a WARN result.
[How]
Wire up .format_mod_supported_async to the existing
amdgpu_dm_plane_format_mod_supported callback so the async format list is
populated. amdgpu does not restrict async flips at the format/modifier
level: the async flip constraints are enforced at atomic check and commit
time and only require a fast update (no change to FB pitch, DCC state,
rotation or memory type) between the old and new buffers. Therefore the
set of formats/modifiers valid for async flips is identical to the
regular IN_FORMATS set, and the same callback can be reused.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: James Lin <PingLei.Lin@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8e2d7bbd6b184c0c1b0fe7cb404c9b5214d89931)
|
|
The cleaner shader sysfs path allocates a 16-dword (64 byte) IB but
incorrectly fills (align_mask + 1) dwords. On GFX rings align_mask is
0xff, so the loop wrote 256 dwords into a 64-byte buffer, causing a
kernel page fault.
The IB only needs to be a minimal NOP shell to schedule the job; the
cleaner shader itself is emitted on the ring via emit_cleaner_shader().
Fill 16 dwords to match the allocation.
v2: Use ib_size_dw variable (Lijo)
Fixes: d361ad5d2fc0 ("drm/amdgpu: Add sysfs interface for running cleaner shader")
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bf21af331ebf72d0935fd70c73192414a422c03a)
CC: stable@vger.kernel.org
|
|
Replace the stack-allocated amdgpu_lockdep mutex with a heap allocation
via kmalloc to fix a stack overflow caused by the large struct size.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dbae980eefb2f46f31cee12f1f8540d0d79f61ae)
|
|
SMI events were reporting incorrect PIDs in containerized environments,
causing test failures where container processes expected to see their
namespace-local PIDs but instead received global host PIDs.
The issue had two root causes:
1. Event functions were called from kernel context (page fault handlers,
migration workers) where 'current' refers to the kernel worker thread,
not the userspace GPU process that triggered the event.
2. PID conversion used task_tgid_vnr() which returns the PID in the
caller's namespace (init namespace for kernel threads), not the task's
own namespace.
This patch updates the SMI event interface:
- Change 8 event function signatures to accept task_struct pointer
instead of pid_t, allowing proper namespace-aware PID conversion
- Convert PIDs using task_tgid_nr_ns(task, task_active_pid_ns(task))
which returns the PID as the process sees it via getpid()
- Update 10 call sites to pass p->lead_thread (the GPU process)
instead of p->lead_thread->pid or current (kernel worker)
This ensures SMI events report container-local PIDs, which is critical
for containerized GPU workloads to correctly correlate events with their
processes.
Tested-by: Andrew Martin <andmarti@amd.com>
Assisted-by: Claude:Sonnet 4-5
Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 60271ec06e04ba5d69d68714f3abdf637d86c257)
|
|
[Why&How]
Periodic detection callbacks from DCN35 was removed for higher IPS
residency causing some displays to fail to recover after DPMS sleep. The
monitors bounces HPD ~1.2s after link training, and without periodic
detection the system enters IPS with no mechanism to wake and rediscover
the display.
Restore the periodic detection calls in dcn35_clk_mgr for now. It should
be replaced with a proper IPS-aware solution long term using DMUB.
Also remove it from dcn31 and dcn314_clk_mgr.c since they do not have IPS,
thus should not affect them.
Fixes: 3f6c060846be ("drm/amd/display: Remove periodic detection callbacks from dcn35+")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5318
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0c300e6a76916e944b6b18a64c73f7895a0fee87)
Cc: stable@vger.kernel.org
|
|
[Why]
Some 8K displays cannot tolerate the reduced phy ssc value
at high link utilization and show corruption or black screen.
[How]
Add an EDID panel-id quirk to utilize existing skip_phy_ssc_reduction flag.
To pass the link into the quirk handler, change the signature of
apply_edid_quirks() to take link as an argument. The dev local in
dm_helpers_parse_edid_caps() becomes unused and is removed.
Fixes: 5fa62c87cffd ("drm/amd/display: Add option to disable PHY SSC reduction on transmitter enable")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 144169e7be0831e09958a906d08d1856751aa6c6)
|
|
The GPU reload test (S3 / mode1 reset / module reload) triggers a
WARN_ON in amdgpu_irq_put() on gfx10 when unloading amdgpu:
WARNING: CPU: 0 PID: 2314 at amd/amdgpu/amdgpu_irq.c:676 amdgpu_irq_put+0xc3/0xe0 [amdgpu]
Call Trace:
gfx_v10_0_hw_fini+0x41/0x150 [amdgpu]
amdgpu_ip_block_hw_fini+0x29/0xc0 [amdgpu]
amdgpu_device_fini_hw+0x315/0x610 [amdgpu]
amdgpu_driver_unload_kms+0x7c/0x90 [amdgpu]
amdgpu_pci_remove+0x51/0x90 [amdgpu]
amdgpu_device_ip_resume_phase2() skips IP blocks whose status.hw is
already set, but amdgpu_device_ip_suspend_phase2() never had the
matching guard, so a block can be suspended twice (e.g. a reset or
recovery issued while the device is already suspended). The second
suspend runs hw_fini again, which now releases the gfx fault IRQs
unconditionally, dropping a refcount that is already zero and tripping
the WARN_ON in amdgpu_irq_put().
The fault/EOP IRQ get/put were balanced through late_init/hw_fini
before, which masked the double-suspend; moving the get into hw_init
made the suspend/resume asymmetry visible as an IRQ refcount underflow.
Honor status.hw in ip_suspend_phase2() so suspend mirrors resume and a
block is only torn down once.
Fixes: 9117d8be850b ("drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini")
Fixes: 482f0e538580 ("drm/amdgpu: fix double ucode load by PSP(v3)")
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f44f2af13c418969be358b15743f939d705de998)
|
|
When kfd_queue_acquire_buffers() was split off from
set_queue_properties_from_user(), set_queue_properties_from_criu()
was missed. Thus, set_queue_properties_from_criu() is not
filling out the buffer fields of queue_properties, which
can come up when subsequent code expects them to be non-null.
Add the proper call to kfd_queue_acquire_buffers(), and also
use the right cast types in set_queue_properties_from_criu()
(which were missed at the same time)
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 88ed96abbbe27b70193544fbc1ee06448c274714)
|
|
During smu_v15_0_0_system_features_control(), the driver sends a
PrepareMp1ForUnload message to PMFW. PMFW then performs nBIF and SYSHUB
function-level resets (FLR), disabling PCIe CFG space reset, which
clears the framebuffer enable bit to zero and disables MC (memory controller)
access from the host.
Re-enable MC access via the nbio mc_access_enable callback right after
PrepareMp1ForUnload completes in smu_v15_0_0_system_features_control().
Signed-off-by: Shubhankar Milind Sardeshpande <Shubhankar.MilindSardeshpande@amd.com>
Signed-off-by: Suresh Guttula <Suresh.Guttula@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 840a3c5aeae779a3bc75d7f747c3ed18b1af6507)
Cc: stable@vger.kernel.org
|