Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch series "Memory allocation profiling", v6.
Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.
Example output:
root@moria-kvm:~# sort -rn /proc/allocinfo
127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
56373248 4737 mm/slub.c:2259 func:alloc_slab_page
14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
3940352 962 mm/memory.c:4214 func:alloc_anon_folio
2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
...
Usage:
kconfig options:
- CONFIG_MEM_ALLOC_PROFILING
- CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
- CONFIG_MEM_ALLOC_PROFILING_DEBUG
adds warnings for allocations that weren't accounted because of a
missing annotation
sysctl:
/proc/sys/vm/mem_profiling
Runtime info:
/proc/allocinfo
Notes:
[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y
Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:
kmalloc pgalloc
(1 baseline) 6.764s 16.902s
(2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
(3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
(4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
(5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)
(6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%)
(7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%)
Memory overhead:
Kernel size:
text data bss dec diff
(1) 26515311 18890222 17018880 62424413
(2) 26524728 19423818 16740352 62688898 264485
(3) 26524724 19423818 16740352 62688894 264481
(4) 26524728 19423818 16740352 62688898 264485
(5) 26541782 18964374 16957440 62463596 39183
Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)
Total overhead is 0.2% of total memory.
Benchmarks:
Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
baseline disabled profiling enabled profiling
avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023)
stdev 0.0137 0.0188 0.0077
hackbench -l 10000
baseline disabled profiling enabled profiling
avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859)
stdev 0.0933 0.0286 0.0489
stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/
[2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/
This patch (of 37):
The next patch drops vmalloc.h from a system header in order to fix a
circular dependency; this adds it to all the files that were pulling it in
implicitly.
[kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c]
Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev
[surenb@google.com: fix arch/x86/mm/numa_32.c]
Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com
[kent.overstreet@linux.dev: a few places were depending on sizes.h]
Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev
[arnd@arndb.de: fix mm/kasan/hw_tags.c]
Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org
[surenb@google.com: fix arc build]
Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
vmw_context_cotable can return either an error or a null pointer and its
usage sometimes went unchecked. Subsequent code would then try to access
either a null pointer or an error value.
The invalid dereferences were only possible with malformed userspace
apps which never properly initialized the rendering contexts.
Check the results of vmw_context_cotable to fix the invalid derefs.
Thanks:
ziming zhang(@ezrak1e) from Ant Group Light-Year Security Lab
who was the first person to discover it.
Niels De Graef who reported it and helped to track down the poc.
Fixes: 9c079b8ce8bf ("drm/vmwgfx: Adapt execbuf to the new validation api")
Cc: <stable@vger.kernel.org> # v4.20+
Reported-by: Niels De Graef <ndegraef@redhat.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Martin Krastev <martin.krastev@broadcom.com>
Cc: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Cc: Ian Forbes <ian.forbes@broadcom.com>
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240110200305.94086-1-zack.rusin@broadcom.com
|
|
Without this definition device errors will display the command name
as (null) when debug logging is enabled.
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240108211655.13187-1-ian.forbes@broadcom.com
|
|
Fix typos in vmwgfx_execbuf.c.
Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231215053016.552019-1-ghanshyam1898@gmail.com
|
|
Surfaces can be backed (i.e. stored in) memory objects (mob's) which
are created and managed by the userspace as GEM buffers. Surfaces
grab only a ttm reference which means that the gem object can
be deleted underneath us, especially in cases where prime buffer
export is used.
Make sure that all userspace surfaces which are backed by gem objects
hold a gem reference to make sure they're not deleted before vmw
surfaces are done with them, which fixes:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 2632 at lib/refcount.c:28 refcount_warn_saturate+0xfb/0x150
Modules linked in: overlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock snd_ens1371 snd_ac97_codec ac97_bus snd_pcm gameport>
CPU: 2 PID: 2632 Comm: vmw_ref_count Not tainted 6.5.0-rc2-vmwgfx #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:refcount_warn_saturate+0xfb/0x150
Code: eb 9e 0f b6 1d 8b 5b a6 01 80 fb 01 0f 87 ba e4 80 00 83 e3 01 75 89 48 c7 c7 c0 3c f9 a3 c6 05 6f 5b a6 01 01 e8 15 81 98 ff <0f> 0b e9 6f ff ff ff 0f b>
RSP: 0018:ffffbdc34344bba0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
RDX: ffff960475ea1548 RSI: 0000000000000001 RDI: ffff960475ea1540
RBP: ffffbdc34344bba8 R08: 0000000000000003 R09: 65646e75203a745f
R10: ffffffffa5b32b20 R11: 72657466612d6573 R12: ffff96037d6a6400
R13: ffff9603484805b0 R14: 000000000000000b R15: ffff9603bed06060
FS: 00007f5fd8520c40(0000) GS:ffff960475e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5fda755000 CR3: 000000010d012005 CR4: 00000000003706e0
Call Trace:
<TASK>
? show_regs+0x6e/0x80
? refcount_warn_saturate+0xfb/0x150
? __warn+0x91/0x150
? refcount_warn_saturate+0xfb/0x150
? report_bug+0x19d/0x1b0
? handle_bug+0x46/0x80
? exc_invalid_op+0x1d/0x80
? asm_exc_invalid_op+0x1f/0x30
? refcount_warn_saturate+0xfb/0x150
drm_gem_object_handle_put_unlocked+0xba/0x110 [drm]
drm_gem_object_release_handle+0x6e/0x80 [drm]
drm_gem_handle_delete+0x6a/0xc0 [drm]
? __pfx_vmw_bo_unref_ioctl+0x10/0x10 [vmwgfx]
vmw_bo_unref_ioctl+0x33/0x40 [vmwgfx]
drm_ioctl_kernel+0xbc/0x160 [drm]
drm_ioctl+0x2d2/0x580 [drm]
? __pfx_vmw_bo_unref_ioctl+0x10/0x10 [vmwgfx]
? do_vmi_munmap+0xee/0x180
vmw_generic_ioctl+0xbd/0x180 [vmwgfx]
vmw_unlocked_ioctl+0x19/0x20 [vmwgfx]
__x64_sys_ioctl+0x99/0xd0
do_syscall_64+0x5d/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? handle_mm_fault+0x16e/0x2f0
? exit_to_user_mode_prepare+0x34/0x170
? irqentry_exit_to_user_mode+0xd/0x20
? irqentry_exit+0x3f/0x50
? exc_page_fault+0x8e/0x190
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f5fda51aaff
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 7>
RSP: 002b:00007ffd536a4d30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd536a4de0 RCX: 00007f5fda51aaff
RDX: 00007ffd536a4de0 RSI: 0000000040086442 RDI: 0000000000000003
RBP: 0000000040086442 R08: 000055fa603ada50 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffd536a51b8
R13: 0000000000000003 R14: 000055fa5ebb4c80 R15: 00007f5fda90f040
</TASK>
---[ end trace 0000000000000000 ]---
A lot of the analyis on the bug was done by Murray McAllister and
Ian Forbes.
Reported-by: Murray McAllister <murray.mcallister@gmail.com>
Cc: Ian Forbes <iforbes@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: a950b989ea29 ("drm/vmwgfx: Do not drop the reference to the handle too soon")
Cc: <stable@vger.kernel.org> # v6.2+
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230928041355.737635-1-zack@kde.org
|
|
Since size of 'header' pointer and '*header' structure is equal on 64-bit
machines issue probably didn't cause any wrong behavior. But anyway,
fixing typo is required.
Fixes: 7a73ba7469cb ("drm/vmwgfx: Use TTM handles instead of SIDs as user-space surface handles.")
Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230905100203.1716731-1-konstantin.meskhidze@huawei.com
|
|
vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.
Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reported-by: Ian Forbes <iforbes@vmware.com>
Fixes: 9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon")
Cc: <stable@vger.kernel.org> # v6.4+
Reviewed-by: Maaz Mombasawala<mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@kde.org
|
|
For multiple commands the driver was not correctly validating the shader
stages resulting in possible kernel oopses. The validation code was only.
if ever, checking the upper bound on the shader stages but never a lower
bound (valid shader stages start at 1 not 0).
Fixes kernel oopses ending up in vmw_binding_add, e.g.:
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 2443 Comm: testcase Not tainted 6.3.0-rc4-vmwgfx #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:vmw_binding_add+0x4c/0x140 [vmwgfx]
Code: 7e 30 49 83 ff 0e 0f 87 ea 00 00 00 4b 8d 04 7f 89 d2 89 cb 48 c1 e0 03 4c 8b b0 40 3d 93 c0 48 8b 80 48 3d 93 c0 49 0f af de <48> 03 1c d0 4c 01 e3 49 8>
RSP: 0018:ffffb8014416b968 EFLAGS: 00010206
RAX: ffffffffc0933ec0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: ffffb8014416b9c0 RDI: ffffb8014316f000
RBP: ffffb8014416b998 R08: 0000000000000003 R09: 746f6c735f726564
R10: ffffffffaaf2bda0 R11: 732e676e69646e69 R12: ffffb8014316f000
R13: ffffb8014416b9c0 R14: 0000000000000040 R15: 0000000000000006
FS: 00007fba8c0af740(0000) GS:ffff8a1277c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000007c0933eb8 CR3: 0000000118244001 CR4: 00000000003706e0
Call Trace:
<TASK>
vmw_view_bindings_add+0xf5/0x1b0 [vmwgfx]
? ___drm_dbg+0x8a/0xb0 [drm]
vmw_cmd_dx_set_shader_res+0x8f/0xc0 [vmwgfx]
vmw_execbuf_process+0x590/0x1360 [vmwgfx]
vmw_execbuf_ioctl+0x173/0x370 [vmwgfx]
? __drm_dev_dbg+0xb4/0xe0 [drm]
? __pfx_vmw_execbuf_ioctl+0x10/0x10 [vmwgfx]
drm_ioctl_kernel+0xbc/0x160 [drm]
drm_ioctl+0x2d2/0x580 [drm]
? __pfx_vmw_execbuf_ioctl+0x10/0x10 [vmwgfx]
? do_fault+0x1a6/0x420
vmw_generic_ioctl+0xbd/0x180 [vmwgfx]
vmw_unlocked_ioctl+0x19/0x20 [vmwgfx]
__x64_sys_ioctl+0x96/0xd0
do_syscall_64+0x5d/0x90
? handle_mm_fault+0xe4/0x2f0
? debug_smp_processor_id+0x1b/0x30
? fpregs_assert_state_consistent+0x2e/0x50
? exit_to_user_mode_prepare+0x40/0x180
? irqentry_exit_to_user_mode+0xd/0x20
? irqentry_exit+0x3f/0x50
? exc_page_fault+0x8b/0x180
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Signed-off-by: Zack Rusin <zackr@vmware.com>
Cc: security@openanolis.org
Reported-by: Ziming Zhang <ezrakiez@gmail.com>
Testcase-found-by: Niels De Graef <ndegraef@redhat.com>
Fixes: d80efd5cb3de ("drm/vmwgfx: Initial DX support")
Cc: <stable@vger.kernel.org> # v4.3+
Reviewed-by: Maaz Mombasawala<mombasawalam@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230616190934.54828-1-zack@kde.org
|
|
v3: Fix vmw_user_bo_lookup which was also dropping the gem reference
before the kernel was done with buffer depending on userspace doing
the right thing. Same bug, different spot.
It is possible for userspace to predict the next buffer handle and
to destroy the buffer while it's still used by the kernel. Delay
dropping the internal reference on the buffers until kernel is done
with them.
Instead of immediately dropping the gem reference in vmw_user_bo_lookup
and vmw_gem_object_create_with_handle let the callers decide when they're
ready give the control back to userspace.
Also fixes the second usage of vmw_gem_object_create_with_handle in
vmwgfx_surface.c which wasn't grabbing an explicit reference
to the gem object which could have been destroyed by the userspace
on the owning surface at any point.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: 8afa13a0583f ("drm/vmwgfx: Implement DRIVER_GEM")
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230211050514.2431155-1-zack@kde.org
|
|
Various bits of the driver used raw ttm_buffer_object instead of the
driver specific vmw_bo object. All those places used to duplicate
the mapped bo caching policy of vmw_bo.
Instead of duplicating all of that code and special casing various
functions to work both with vmw_bo and raw ttm_buffer_object's unify
the buffer object handling code.
As part of that work fix the naming of bo's, e.g. insted of generic
backup use 'guest_memory' because that's what it really is.
All of it makes the driver easier to maintain and the code easier to
read. Saves 100+ loc as well.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230131033542.953249-9-zack@kde.org
|
|
Problem with explicit placement selection in vmwgfx is that by the time
the buffer object needs to be validated the information about which
placement was supposed to be used is lost. To workaround this the driver
had a bunch of state in various places e.g. as_mob or cpu_blit to
somehow convey the information on which placement was intended.
Fix it properly by allowing the buffer objects to hold their preferred
placement so it can be reused whenever needed. This makes the entire
validation pipeline a lot easier both to understand and maintain.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230131033542.953249-8-zack@kde.org
|
|
The rest of the drivers which are using ttm have mostly standardized on
driver_prefix_bo as the name for subclasses of the TTM buffer object.
Make vmwgfx match the rest of the drivers and follow the same naming
semantics.
This is especially clear given that the name of the file in which the
object was defined is vmw_bo.c.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230131033542.953249-4-zack@kde.org
|
|
Due to holidays we started -next with more -fixes in-flight than
usual, and people have been asking where they are. Backmerge to get
things better in sync.
Conflicts:
- Tiny conflict in drm_fbdev_generic.c between variable rename and
missing error handling that got added.
- Conflict in drm_fb_helper.c between the added call to vgaswitcheroo
in drm_fb_helper_single_fb_probe and a refactor patch that extracted
lots of helpers and incidentally removed the dev local variable.
Readd it to make things compile.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
User resource lookups used rcu to avoid two extra atomics. Unfortunately
the rcu paths were buggy and it was easy to make the driver crash by
submitting command buffers from two different threads. Because the
lookups never show up in performance profiles replace them with a
regular spin lock which fixes the races in accesses to those shared
resources.
Fixes kernel oops'es in IGT's vmwgfx execution_buffer stress test and
seen crashes with apps using shared resources.
Fixes: e14c02e6b699 ("drm/vmwgfx: Look up objects without taking a reference")
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221207172907.959037-1-zack@kde.org
|
|
Merge and cleanup the two headers into a single description of the
object API. Also move all the documentation to the implementation and
drop unnecessary includes from the header.
No functional change.
v2: minimal checkpatch.pl cleanup
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-4-christian.koenig@amd.com
|
|
Change ttm_resource structure from num_pages to size_t size in bytes.
v1 -> v2: change PFN_UP(dst_mem->size) to ttm->num_pages
v1 -> v2: change bo->resource->size to bo->base.size at some places
v1 -> v2: remove the local variable
v1 -> v2: cleanup cmp_size_smaller_first()
v2 -> v3: adding missing PFN_UP in ttm_bo_vm_fault_reserved
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027091237.983582-1-Amaranath.Somalapuram@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Fixes a warning about extra docs about a function argument that has been
removed a while back:
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:3888: warning: Excess function
parameter 'sync_file' description in 'vmw_execbuf_copy_fence_user'
Fixes: a0f90c881570 ("drm/vmwgfx: Fix stale file descriptors on failed usercopy")
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221022040236.616490-18-zack@kde.org
|
|
implementation.
Vmwgfx's hashtab implementation needs to be replaced with linux/hashtable
to reduce maintenence burden.
As part of this effort, refactor the res_ht hashtable used for resource
validation during execbuf execution to use linux/hashtable implementation.
This also refactors vmw_validation_context to use vmw_sw_context as the
container for the hashtable, whereas before it used a vmwgfx_open_hash
directly. This makes vmw_validation_context less generic, but there is
no functional change since res_ht is the only instance where validation
context used a hashtable in vmwgfx driver.
Signed-off-by: Maaz Mombasawala <mombasawalam@vmware.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221022040236.616490-6-zack@kde.org
|
|
The vmw_user_bo_noref_lookup() function cannot return NULL. If it
could, then this function would return PTR_ERR(NULL) which is success.
Returning success without initializing "*vmw_bo_p = vmw_bo;" would
lead to an uninitialized variable bug in the caller. Smatch complains
about this:
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:1177 vmw_translate_mob_ptr() warn: passing zero to 'PTR_ERR'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:1314 vmw_cmd_dx_bind_query() error: uninitialized symbol 'vmw_bo'.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YtZ9qrKeBqmmK8Hv@kili
|
|
First backmerge into drm-misc-next. Required for more helpers backmerged,
and to pull in 5.17 (rc2).
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
Decomposing fence containers don't seem to make any sense here.
So just remove the function entirely and call dma_fence_wait() directly.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220124130328.2376-12-christian.koenig@amd.com
|
|
A failing usercopy of the fence_rep object will lead to a stale entry in
the file descriptor table as put_unused_fd() won't release it. This
enables userland to refer to a dangling 'file' object through that still
valid file descriptor, leading to all kinds of use-after-free
exploitation scenarios.
Fix this by deferring the call to fd_install() until after the usercopy
has succeeded.
Fixes: c906965dee22 ("drm/vmwgfx: Add export fence to file descriptor support")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This adds support for the
SVGA_3D_CMD_DX_SET_VS/PS/GS/HS/DS/CS_CONSTANT_BUFFER_OFFSET commands (which only update
the offset, but don't rebind the buffer), which saves some overhead.
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-11-zack@kde.org
|
|
If the host supports SVGA3D_DEVCAP_GL43, we can handle 64 instead of
just 8 UAVs.
Based on a patch from Roland Scheidegger <sroland@vmware.com>.
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-9-zack@kde.org
|
|
This is going to be required for setting the sample count when
rendering with no attachments.
Also cleans up view handling (should fix depthstencil_v2).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-8-zack@kde.org
|
|
This is initial change adding support for DRIVER_GEM to vmwgfx. vmwgfx
was written before GEM and has always used TTM. Over the years the
TTM buffers started inherting from GEM objects but vmwgfx never
implemented GEM making it quite awkward. We were directly setting
variables in GEM objects to not make DRM crash.
This change brings vmwgfx inline with other DRM drivers and allows us
to use a lot of DRM helpers which have depended on drivers with GEM
support.
Due to historical reasons vmwgfx splits the idea of a buffer and surface
which makes it a littly tricky since either one can be used in most
of our ioctl's which take user space handles. For now our BO's are
GEM objects and our surfaces are opaque objects which are backed by
GEM objects. In the future I'd like to combine those into a single
BO but we don't want to break any of our existing ioctl's so it will
take time to do it in a non-destructive way.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-5-zack@kde.org
|
|
vmwgfx shared very elaborate memory accounting with ttm. It was moved
from ttm to vmwgfx in change
f07069da6b4c ("drm/ttm: move memory accounting into vmwgfx v4")
but because of complexity it was hard to maintain. Some parts of the code
weren't freeing memory correctly and some were missing accounting all
together. While those would be fairly easy to fix the fundamental reason
for memory accounting in the driver was the ability to invoke shrinker
which is part of TTM code as well (with support for unified memory
hopefully coming soon).
That meant that vmwgfx had a lot of code that was either unused or
duplicating code from TTM. Removing this code also prevents excessive
calls to global swapout which were common during memory pressure
because both vmwgfx and TTM would invoke the shrinker when memory
usage reached half of RAM.
Fixes: f07069da6b4c ("drm/ttm: move memory accounting into vmwgfx v4")
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206172620.3139754-2-zack@kde.org
|
|
Besides some legacy code, vmwgfx is the only user of DRM's hash-
table implementation. Copy the code into the driver, so that the
core code can be retired.
No functional changes. However, the real solution for vmwgfx is to
use Linux' generic hash-table functions.
v2:
* add TODO item for updating vmwgfx (Sam)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211129094841.22499-3-tzimmermann@suse.de
|
|
Historically our device headers have been forked versions of the
internal device headers, this has made maintaining them a bit
of a burden. To fix the situation, going forward, the device headers
will be verbatim copies of the internal headers.
To do that the driver code has to be adapted to use pristine
device headers. This will make future update to the device
headers trivial and automatic.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210615182336.995192-2-zackr@vmware.com
|
|
Fix some minor issues that Coverity spotted in the code. None
of that are serious but they're all valid concerns so fixing
them makes sense.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210609172307.131929-5-zackr@vmware.com
|
|
VMware mks-guest-stats mechanism allows the collection of performance stats from
guest userland GL contexts, as well as from vmwgfx kernelspace, via a set of sw-
defined performance counters. The userspace performance counters are (de)registerd
with vmware-vmx-stats hypervisor via new iocts. The vmwgfx kernelspace counters
are controlled at build-time via a new config DRM_VMWGFX_MKSSTATS.
* Add vmw_mksstat_{add|remove|reset}_ioctl controlling the tracking of
mks-guest-stats in guest winsys contexts
* Add DRM_VMWGFX_MKSSTATS config to drivers/gpu/drm/vmwgfx/Kconfig controlling
the instrumentation of vmwgfx for kernelspace mks-guest-stats counters
* Instrument vmwgfx vmw_execbuf_ioctl to collect mks-guest-stats according to
DRM_VMWGFX_MKSSTATS
Signed-off-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210609172307.131929-3-zackr@vmware.com
|
|
When we want to decouble resource management from buffer management we need to
be able to handle resources separately.
Add a resource pointer and rename bo->mem so that all code needs to
change to access the pointer instead.
No functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210430092508.60710-4-christian.koenig@amd.com
|
|
SVGA3 is the next version of our PCI device. Some of the changes
include using MMIO for register accesses instead of ioports,
deprecating the FIFO MMIO and removing a lot of the old and
legacy functionality. SVGA3 doesn't support guest backed
objects right now so everything except 3D is working.
v2: Fixes all the static analyzer warnings
Signed-off-by: Zack Rusin <zackr@vmware.com>
Cc: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505191007.305872-1-zackr@vmware.com
|
|
Now since Christian reworked TTM to always keep objects on the LRU
list unless they are pinned we shouldn't need the reservation
semaphore. It makes the driver code a lot cleaner, especially
because it was a little hard to reason when and where the
reservation semaphore needed to be held.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505035740.286923-5-zackr@vmware.com
|
|
The SVGA3dCmdDXGenMips command uses a shader-resource view to access
the underlying surface. Normally accesses using that view-type are not
dirtying the underlying surface, but that particular command is an
exception.
Mark the surface gpu-dirty after a SVGA3dCmdDXGenMips command has been
submitted.
This fixes the piglit getteximage-formats test run with
SVGA_FORCE_COHERENT=1
Fixes: a9f58c456e9d ("drm/vmwgfx: Be more restrictive when dirtying resources")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505035740.286923-3-zackr@vmware.com
|
|
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201211085751.3089-1-zhengyongjun3@huawei.com
|
|
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:89: warning: Enum value 'vmw_res_rel_max' not described in enum 'vmw_resource_relocation_type'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:136: warning: Function parameter or member 'func' not described in 'vmw_cmd_entry'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:136: warning: Function parameter or member 'cmd_name' not described in 'vmw_cmd_entry'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:212: warning: Function parameter or member 'res' not described in 'vmw_cmd_ctx_first_setup'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:523: warning: Function parameter or member 'sw_context' not described in 'vmw_resource_relocation_add'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:523: warning: Excess function parameter 'list' description in 'vmw_resource_relocation_add'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:653: warning: Function parameter or member 'p_res' not described in 'vmw_cmd_res_check'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:653: warning: Excess function parameter 'p_val' description in 'vmw_cmd_res_check'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:1716: warning: Function parameter or member 'res' not described in 'vmw_cmd_res_switch_backup'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:1716: warning: Excess function parameter 'val_node' description in 'vmw_cmd_res_switch_backup'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:3757: warning: Function parameter or member 'file_priv' not described in 'vmw_execbuf_fence_commands'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:3757: warning: Function parameter or member 'dev_priv' not described in 'vmw_execbuf_fence_commands'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:3757: warning: Function parameter or member 'p_fence' not described in 'vmw_execbuf_fence_commands'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:3757: warning: Function parameter or member 'p_handle' not described in 'vmw_execbuf_fence_commands'
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c:3954: warning: Function parameter or member 'kernel_commands' not described in 'vmw_execbuf_cmdbuf'
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210115181313.3431493-4-lee.jones@linaro.org
|
|
Lets try to cleanup the usage of the term FIFO which we used for
both our MMIO based cmd queue processing and for general
command processing which could have been using command buffers
interface. We're going to rename the functions which are processing
commands (and work either via MMIO or command buffers) as _cmd_
and functions which operate on the MMIO based commands as FIFO
to match the SVGA device naming.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/414044/?series=85516&rev=2
|
|
Throttling was used before fencing to implement early vsync
support in the xorg state tracker a long time ago. The xorg
state tracker has been removed years ago and no one else
has ever used throttling. It's time to remove this code,
it hasn't been used or tested in years.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Link: https://patchwork.freedesktop.org/patch/414042/?series=85516&rev=2
|
|
Based on an idea from Dave, but cleaned up a bit.
We had multiple fields for essentially the same thing.
Now bo->base.size is the original size of the BO in
arbitrary units, usually bytes.
bo->mem.num_pages is the size in number of pages in the
resource domain of bo->mem.mem_type.
v2: use the GEM object size instead of the BO size
v3: fix printks in some places
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com> (v1)
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/406831/
|
|
There is a spelling mistake in a DRM_ERROR message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Calculate GPU offset within vmwgfx driver itself without depending on
bo->offset.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/372933/
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
With SM5 capability a new version of streamoutput is supported by device
which need backing mob and a new field. With this change the new command
is supported in command buffer.
v2: Also track streamoutput context binding in binding manager.
v3: Track only one streamoutput as only one can be set to context.
v4: Fix comment typos
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Previous name vmw_ctx_bindinfo_so is misleading because it actually
represent so target and stream output is a new resource type that needs
tracking for SM5 capable device. Also rename binding type enum and
internal functions to reflect these belongs to so targets.
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Validate indirect and dispatch commands in command buffer.
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Virtual device now support new commands to manage unordered access
views. Allow them as part of user-space command buffer. This involves
adding UA view cotable, binding tracker info, new view type and command
verifier functions.
v2: fix comment typo
v3: style fixes (don't use deprecated PTR_RET)
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Virtual device now supports new shader types, allow them as valid shader
type in command buffer. Also add per shader bind info in binding manager
state for new shader type.
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Instead of having different bool in device private to represent
incremental graphics context capabilities, add a new sm type enum.
v2: Use enum instead of bit flag.
v3: Incorporated review comments.
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Logic ops commands are marked as deprecated by virtual device and were
never used by vmwgfx.
Signed-off-by: Deepak Rawat <drawat.floss@gmail.com>
Reviewed-by: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
|
|
Commit 508108ea2747 ("drm/vmwgfx: Don't refcount command-buffer managed
resource lookups during command buffer validation") slips in use of
deprecated PTR_RET. Use PTR_ERR_OR_ZERO instead.
As the PTR_ERR_OR_ZERO is a bit longer than PTR_RET, we introduce
local variable ret for proper indentation and line-length limits.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|