summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2021-05-11drm/amdgpu: Rename to ras_*_enabledLuben Tuikov8-36/+36
Rename, ras_hw_supported --> ras_hw_enabled, and ras_features --> ras_enabled, to show that ras_enabled is a subset of ras_hw_enabled, which itself is a subset of the ASIC capability. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Move up ras_hw_supportedLuben Tuikov3-34/+28
Move ras_hw_supported into struct amdgpu_dev. The dependency is: struct amdgpu_ras <== struct amdgpu_dev <== ASIC, read as "struct amdgpu_ras depends on struct amdgpu_dev, which depends on the hardware." This can be loosely understood as, "if RAS is supported, which is property of the ASIC (struct amdgpu_dev), then we can access struct amdgpu_ras." v2: Fix a typo: must binary AND in ternary cond in amdgpu_ras.c Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Remove redundant ras->supportedLuben Tuikov6-15/+15
Remove redundant ras->supported, as this value is also stored in adev->ras_features. Use adev->ras_features, as that supercedes "ras", since the latter is its member. The dependency goes like this: ras <== adev->ras_features <== hw_supported, and is read as "ras depends on ras_features, which depends on hw_supported." The arrows show the flow of information, i.e. the dependency update. "hw_supported" should also live in "adev". Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: update vcn1.0 Non-DPG suspend sequenceSathishkumar S1-4/+9
update suspend register settings in Non-DPG mode. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: update the shader to clear specific SGPRsDennis Li1-44/+47
Add shader codes to explicitly clear specific SGPRs, such as flat_scratch_lo, flat_scratch_hi and so on. And also correct the allocation size of SGPRs in PGM_RSRC1. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Enable TCP channel hashing for AldebaranMukul Joshi2-7/+13
Enable TCP channel hashing to match DF hash settings for Aldebaran. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Use device specific BO size & stride check.Bas Nieuwenhuizen1-6/+175
The builtin size check isn't really the right thing for AMD modifiers due to a couple of reasons: 1) In the format structs we don't do set any of the tilesize / blocks etc. to avoid having format arrays per modifier/GPU 2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even for tiled ... 3) The pitch for the DCC planes is really the pixel pitch of the main surface that would be covered by it ... Note that we only handle GFX9+ case but we do this after converting the implicit modifier to an explicit modifier, so on GFX9+ all framebuffers should be checked here. There is a TODO about DCC alignment, but it isn't worse than before and I'd need to dig a bunch into the specifics. Getting this out in a reasonable timeframe to make sure it gets the appropriate testing seemed more important. Finally as I've found that debugging addfb2 failures is a pita I was generous adding explicit error messages to every failure case. Fixes: f258907fdd83 ("drm/amdgpu: Verify bo size can fit framebuffer size on init.") Tested-by: Simon Ser <contact@emersion.fr> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.Bas Nieuwenhuizen1-0/+4
Otherwise tiling modes that require the values form this field (In particular _*_X) would be corrupted upon video decode. Copied from the VCN v2 code. Fixes: 99541f392b4d ("drm/amdgpu: add mc resume DPG mode for VCN3.0") Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: change the default timeout for kernel compute queuesAlex Deucher2-7/+5
Change to 60s. This matches what we already do in virtualization. Infinite timeout can lead to deadlocks in the kernel. Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: set vcn mgcg flag for picassoSathishkumar S1-1/+2
enable vcn mgcg flag for picasso. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amd/amdgpu/amdgpu_drv.c: Replace drm_modeset_lock_all with drm_modeset_lockFabio M. De Francesco1-6/+4
drm_modeset_lock_all() is not needed here, so it is replaced with drm_modeset_lock(). The crtc list around which we are looping never changes, therefore the only lock we need is to protect access to crtc->state. Suggested-by: Daniel Vetter <daniel@ffwll.ch> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: drop the GCR packet from the emit_ib frame for sdma5.0Alex Deucher1-12/+0
It's not needed here and has been added to the proper place in the previous patch. This aligns with what we do for sdma 5.2. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Add graphics cache rinse packet for sdma 5.0Alex Deucher1-0/+28
Add emit mem sync callback for sdma_v5_0 In amdgpu sync object test, three threads created jobs to send GFX IB and SDMA IB in sequence. After the first GFX thread joined, sometimes the third thread will reuse the same physical page to store the SDMA IB. There will be a risk that SDMA will read GFX IB in the previous physical page. So it's better to flush the cache before commit sdma IB. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: force enable gfx ras for vega20 wsStanley.Yang1-0/+21
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: remove excess function parameterNirmoy Das1-1/+0
Fix below htmldocs build warning: "warning: Excess function parameter 'vm_context' description in 'amdgpu_vm_init'" Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Rename the flags to eliminate ambiguity v2Peng Ju Zhou1-3/+3
The flags vf_reg_access_* may cause confusion, rename the flags to make it more clear. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: add new MC firmware for Polaris12 32bit ASICEvan Quan1-3/+10
Polaris12 32bit ASIC needs a special MC firmware. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Add Aldebaran virtualization supportZhigang Luo2-3/+8
1. add Aldebaran in virtualization detection list. 2. disable Aldebaran virtual display support as there is no GFX engine in Aldebaran. 3. skip TMR loading if Aldebaran is in virtualizatin mode as it shares the one host loaded. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Add a new device ID for AldebaranZhigang Luo1-0/+1
It is Aldebaran VF device ID, for virtualization support. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: enable gfx ras in aldebran by defaultHawking Zhang1-1/+2
gfx ras now can be enabled by default in aldebaran Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: switch to mmhub ras callback for ras finiHawking Zhang1-1/+1
invoke callback function for mmhub ras fini Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: retired reset_ras_error_count from hdp callbacksHawking Zhang2-2/+0
It was moved to hdp ras callbacks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: enable ras error count query and reset for HDPHawking Zhang3-3/+14
add hdp block ras error query and reset support in amdgpu ras error count query and reset interface Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: init/fini hdp v4_0 rasHawking Zhang1-0/+11
invoke hdp v4_0 ras init in gmc late_init phase while ras fini in gmc sw_fini phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: initialize hdp v4_0 ras functionsHawking Zhang1-0/+7
hdp v4_0 support ras features Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: implement hdp v4_0 ras functionsHawking Zhang2-2/+29
implement hdp v4_0 ras functions, including ras init/fini, query/reset_error_counter Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: add helpers for hdp ras init/finiHawking Zhang3-1/+72
hdp ras init/fini are common functions that can be shared among hdp generations Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: add hdp ras structuresHawking Zhang1-0/+10
centralize all hdp ras operation to ras_funcs Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_createSimon Ser1-0/+1
This error code-path is missing a drm_gem_object_put call. Other error code-paths are fine. Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: 1769152ac64b ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amd/amdgpu: Fix errors in documentation of function parametersFabio M. De Francesco3-11/+13
In the function documentation, I removed the excess parameters, described the undocumented ones, and fixed the syntax errors. Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Register VGA clients after init can no longer failKai-Heng Feng1-15/+13
When an amdgpu device fails to init, it makes another VGA device cause kernel splat: kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init kernel: amdgpu: probe of 0000:08:00.0 failed with error -110 ... kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G W 5.12.0-rc8+ #12 kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020 kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu] kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0 kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000 kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246 kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000 kernel: FS: 00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: vga_arbiter_notify_clients.part.0+0x4a/0x80 kernel: vga_get+0x17f/0x1c0 kernel: vga_arb_write+0x121/0x6a0 kernel: ? apparmor_file_permission+0x1c/0x20 kernel: ? security_file_permission+0x30/0x180 kernel: vfs_write+0xca/0x280 kernel: ksys_write+0x67/0xe0 kernel: __x64_sys_write+0x1a/0x20 kernel: do_syscall_64+0x38/0x90 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae kernel: RIP: 0033:0x7f93041e02f7 kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7 kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0 kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808 kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980 kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic kernel: CR2: 0000000000000018 kernel: ---[ end trace 76d04313d4214c51 ]--- Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(), so the VGA clients remain registered. So when vga_arbiter_notify_clients() iterates over registered clients, it causes NULL pointer dereference. Since there's no reason to register VGA clients that early, so solve the issue by putting them after all the goto cleanups. v2: - Remove redundant vga_switcheroo cleanup in failed: label. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardownPavan Kumar Ramayanam1-0/+3
The runtime resume PM op disregards the return value from amdgpu_device_resume(), masking errors for failed resumes at the PM layer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Pavan Kumar Ramayanam <pavan.ramayanam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: fix r initial valuesVictor Zhao1-1/+1
Sriov gets suspend of IP block <dce_virtual> failed as return value was not initialized. v2: return 0 directly to align original code semantic before this was broken out into a separate helper function instead of setting initial values Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-07Merge tag 'amd-drm-fixes-5.13-2021-05-05' of ↵Dave Airlie8-33/+225
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-5.13-2021-05-05: amdgpu: - MPO hang workaround - Fix for concurrent VM flushes on vega/navi - dcefclk is not adjustable on navi1x and newer - MST HPD debugfs fix - Suspend/resumes fixes - Register VGA clients late in case driver fails to load - Fix GEM leak in user framebuffer create - Add support for polaris12 with 32 bit memory interface - Fix duplicate cursor issue when using overlay - Fix corruption with tiled surfaces on VCN3 - Add BO size and stride check to fix BO size verification radeon: - Fix off-by-one in power state parsing - Fix possible memory leak in power state parsing Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210506033929.3875-1-alexander.deucher@amd.com
2021-05-06drm/amdgpu: Use device specific BO size & stride check.Bas Nieuwenhuizen1-6/+175
The builtin size check isn't really the right thing for AMD modifiers due to a couple of reasons: 1) In the format structs we don't do set any of the tilesize / blocks etc. to avoid having format arrays per modifier/GPU 2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even for tiled ... 3) The pitch for the DCC planes is really the pixel pitch of the main surface that would be covered by it ... Note that we only handle GFX9+ case but we do this after converting the implicit modifier to an explicit modifier, so on GFX9+ all framebuffers should be checked here. There is a TODO about DCC alignment, but it isn't worse than before and I'd need to dig a bunch into the specifics. Getting this out in a reasonable timeframe to make sure it gets the appropriate testing seemed more important. Finally as I've found that debugging addfb2 failures is a pita I was generous adding explicit error messages to every failure case. Fixes: f258907fdd83 ("drm/amdgpu: Verify bo size can fit framebuffer size on init.") Tested-by: Simon Ser <contact@emersion.fr> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-06drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.Bas Nieuwenhuizen1-0/+4
Otherwise tiling modes that require the values form this field (In particular _*_X) would be corrupted upon video decode. Copied from the VCN v2 code. Fixes: 99541f392b4d ("drm/amdgpu: add mc resume DPG mode for VCN3.0") Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-05-05drm/amdgpu: Add show_fdinfo() interfaceRoy Sun11-2/+288
Tracking devices, process info and fence info using /proc/pid/fdinfo Signed-off-by: David M Nieto <David.Nieto@amd.com> Signed-off-by: Roy Sun <Roy.Sun@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426062701.39732-2-Roy.Sun@amd.com
2021-05-05drm/amdgpu: add new MC firmware for Polaris12 32bit ASICEvan Quan1-3/+10
Polaris12 32bit ASIC needs a special MC firmware. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-05-03drm/ttm: always initialize the full ttm_resource v2Christian König1-2/+0
Init all fields in ttm_resource_alloc() when we create a new resource. v2: use place->mem_type instead of res->mem_type Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210430092508.60710-2-christian.koenig@amd.com
2021-04-29amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_createSimon Ser1-0/+1
This error code-path is missing a drm_gem_object_put call. Other error code-paths are fine. Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: 1769152ac64b ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-04-29drm/amdgpu: Register VGA clients after init can no longer failKai-Heng Feng1-15/+13
When an amdgpu device fails to init, it makes another VGA device cause kernel splat: kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init kernel: amdgpu: probe of 0000:08:00.0 failed with error -110 ... kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G W 5.12.0-rc8+ #12 kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020 kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu] kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0 kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000 kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246 kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000 kernel: FS: 00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: vga_arbiter_notify_clients.part.0+0x4a/0x80 kernel: vga_get+0x17f/0x1c0 kernel: vga_arb_write+0x121/0x6a0 kernel: ? apparmor_file_permission+0x1c/0x20 kernel: ? security_file_permission+0x30/0x180 kernel: vfs_write+0xca/0x280 kernel: ksys_write+0x67/0xe0 kernel: __x64_sys_write+0x1a/0x20 kernel: do_syscall_64+0x38/0x90 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae kernel: RIP: 0033:0x7f93041e02f7 kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7 kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0 kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808 kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980 kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic kernel: CR2: 0000000000000018 kernel: ---[ end trace 76d04313d4214c51 ]--- Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(), so the VGA clients remain registered. So when vga_arbiter_notify_clients() iterates over registered clients, it causes NULL pointer dereference. Since there's no reason to register VGA clients that early, so solve the issue by putting them after all the goto cleanups. v2: - Remove redundant vga_switcheroo cleanup in failed: label. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardownPavan Kumar Ramayanam1-0/+3
The runtime resume PM op disregards the return value from amdgpu_device_resume(), masking errors for failed resumes at the PM layer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Pavan Kumar Ramayanam <pavan.ramayanam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: fix r initial valuesVictor Zhao1-1/+1
Sriov gets suspend of IP block <dce_virtual> failed as return value was not initialized. v2: return 0 directly to align original code semantic before this was broken out into a separate helper function instead of setting initial values Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-04-29drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2Christian König3-8/+18
Starting with Vega the hardware supports concurrent flushes of VMID which can be used to implement per process VMID allocation. But concurrent flushes are mutual exclusive with back to back VMID allocations, fix this to avoid a VMID used in two ways at the same time. v2: don't set ring to NULL Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-04-29drm/amdgpu: fix no full coverage issue for gprs initializationDennis Li2-265/+516
The wave's number per simd in aldebaran is changed to 8, so it is impossible to use old algorithm to initiate all sgprs with one threadgroup. The new algorithm firstly use three threadgroups to initiate most sgprs simultaneously and then use another threadgroup with 4 waves to cover other uninitiated sgprs. v2: Add more description about the new algorithm to clear sgprs and add some comment for shader binaries Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: address remove from fault filterPhilip Yang2-6/+48
Add interface to remove address from fault filter ring by resetting fault ring entry key, then future vm fault on the address will be processed to recover. Define fault key as atomic64_t type to use atomic read/set/cmpxchg key to protect fault ring access by interrupt handler and interrupt deferred work for vg20. Change fault->timestamp to 48-bit to share same uint64_t with 8-bit fault->next, it is enough for 48bit IH timestamp. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: return IH ring drain finished if ring is emptyPhilip Yang1-1/+3
Sometimes IH do not setup ring wptr overflow flag after wptr exceed rptr. As a workaround, if IH rptr equals to wptr, ring is empty, return true to indicate IH ring checkpoint is processed, IH ring drain is finished. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amd/amdgpu/sriov disable all ip hw status by defaultJack Zhang1-1/+1
Disable all ip's hw status to false before any hw_init. Only set it to true until its hw_init is executed. The old 5.9 branch has this change but somehow the 5.11 kernrel does not have this fix. Without this change, sriov tdr have gfx IB test fail. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Review-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: restructure amdgpu_vram_mgr_newChristian König1-31/+27
Merge the two loops, loosen the restriction for big allocations. This reduces the CPU overhead in the good case, but increases it a bit under memory pressure. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-and-Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: Correct and simplify sdma 4.x irq.num_typesFeifei Xu1-16/+7
Correct and init the sdma4.x irq.num_types. v2: squash in fix (Alex) Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>