diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 23:58:46 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 23:58:46 +0300 |
commit | 8d0749b4f83bf4768ceae45ee6a79e6e7eddfc2a (patch) | |
tree | 069cc92e93982e0b921c09e71df6f7b68b4cbfa2 /drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | |
parent | bf4eebf8cfa2cd50e20b7321dfb3effdcdc6e909 (diff) | |
parent | cb6846fbb83b574c85c2a80211b402a6347b60b1 (diff) | |
download | linux-8d0749b4f83bf4768ceae45ee6a79e6e7eddfc2a.tar.xz |
Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights are support for privacy screens found in new laptops, a
bunch of nomodeset refactoring, and i915 enables ADL-P systems by
default, while starting to add RPL-S support.
vmwgfx adds GEM and support for OpenGL 4.3 features in userspace.
Lots of internal refactorings around dma reservations, and lots of
driver refactoring as well.
Summary:
core:
- add privacy screen support
- move nomodeset option into drm subsystem
- clean up nomodeset handling in drivers
- make drm_irq.c legacy
- fix stack_depot name conflicts
- remove DMA_BUF_SET_NAME ioctl restrictions
- sysfs: send hotplug event
- replace several DRM_* logging macros with drm_*
- move hashtable to legacy code
- add error return from gem_create_object
- cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
- kernel.h related include cleanups
- support XRGB2101010 source buffers
ttm:
- don't include drm hashtable
- stop pruning fences after wait
- documentation updates
dma-buf:
- add dma_resv selftest
- add debugfs helpers
- remove dma_resv_get_excl_unlocked
- documentation
- make fences mandatory in dma_resv_add_excl_fence
dp:
- add link training delay helpers
gem:
- link shmem/cma helpers into separate modules
- use dma_resv iteratior
- import dma-buf namespace into gem helper modules
scheduler:
- fence grab fix
- lockdep fixes
bridge:
- switch to managed MIPI DSI helpers
- register and attach during probe fixes
- convert to YAML in several places.
panel:
- add bunch of new panesl
simpledrm:
- support FB_DAMAGE_CLIPS
- support virtual screen sizes
- add Apple M1 support
amdgpu:
- enable seamless boot for DCN 3.01
- runtime PM fixes
- use drm_kms_helper_connector_hotplug_event
- get all fences at once
- use generic drm fb helpers
- PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
- add smart trace buffer (STB) for supported GPUs
- display debugfs entries
- new SMU debug option
- Documentation update
amdkfd:
- IP discovery enumeration refactor
- interface between driver fixes
- SVM fixes
- kfd uapi header to define some sysfs bitfields.
i915:
- support VESA panel backlights
- enable ADL-P by default
- add eDP privacy screen support
- add Raptor Lake S (RPL-S) support
- DG2 page table support
- lots of GuC/HuC fw refactoring
- refactored i915->gt interfaces
- CD clock squashing support
- enable 10-bit gamma support
- update ADL-P DMC fw to v2.14
- enable runtime PM autosuspend by default
- ADL-P DSI support
- per-lane DP drive settings for ICL+
- add support for pipe C/D DMC firmware
- Atomic gamma LUT updates
- remove CCS FB stride restrictions on ADL-P
- VRR platform support for display 11
- add support for display audio codec keepalive
- lots of display refactoring
- fix runtime PM handling during PXP suspend
- improved eviction performance with async TTM moves
- async VMA unbinding improvements
- VMA locking refactoring
- improved error capture robustness
- use per device iommu checks
- drop bits stealing from i915_sw_fence function ptr
- remove dma_resv_prune
- add IC cache invalidation on DG2
nouveau:
- crc fixes
- validate LUTs in atomic check
- set HDMI AVI RGB quant to full
tegra:
- buffer objects reworks for dma-buf compat
- NVDEC driver uAPI support
- power management improvements
etnaviv:
- IOMMU enabled system support
- fix > 4GB command buffer mapping
- close a DoS vector
- fix spurious GPU resets
ast:
- fix i2c initialization
rcar-du:
- DSI output support
exynos:
- replace legacy gpio interface
- implement generic GEM object mmap
msm:
- dpu plane state cleanup in prep for multirect
- dpu debugfs cleanups
- dp support for sc7280
- a506 support
- removal of struct_mutex
- remove old eDP sub-driver
anx7625:
- support MIPI DSI input
- support HDMI audio
- fix reading EDID
lvds:
- fix bridge DT bindings
megachips:
- probe both bridges before registering
dw-hdmi:
- allow interlace on bridge
ps8640:
- enable runtime PM
- support aux-bus
tx358768:
- enable reference clock
- add pulse mode support
ti-sn65dsi86:
- use regmap bulk write
- add PWM support
etnaviv:
- get all fences at once
gma500:
- gem object cleanups
kmb:
- enable fb console
radeon:
- use dma_resv_wait_timeout
rockchip:
- add DSP hold timeout
- suspend/resume fixes
- PLL clock fixes
- implement mmap in GEM object functions
- use generic fbdev emulation
sun4i:
- use CMA helpers without vmap support
vc4:
- fix HDMI-CEC hang with display is off
- power on HDMI controller while disabling
- support 4K@60Hz modes
- support 10-bit YUV 4:2:0 output
vmwgfx:
- fix leak on probe errors
- fail probing on broken hosts
- new placement for MOB page tables
- hide internal BOs from userspace
- implement GEM support
- implement GL 4.3 support
virtio:
- overflow fixes
xen:
- implement mmap as GEM object function
omapdrm:
- fix scatterlist export
- support virtual planes
mediatek:
- MT8192 support
- CMDQ refinement"
* tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits)
drm/amdgpu: no DC support for headless chips
drm/amd/display: fix dereference before NULL check
drm/amdgpu: always reset the asic in suspend (v2)
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
drm/amd/display: Fix the uninitialized variable in enable_stream_features()
drm/amdgpu: fix runpm documentation
amdgpu/pm: Make sysfs pm attributes as read-only for VFs
drm/amdgpu: save error count in RAS poison handler
drm/amdgpu: drop redundant semicolon
drm/amd/display: get and restore link res map
drm/amd/display: support dynamic HPO DP link encoder allocation
drm/amd/display: access hpo dp link encoder only through link resource
drm/amd/display: populate link res in both detection and validation
drm/amd/display: define link res and make it accessible to all link interfaces
drm/amd/display: 3.2.167
drm/amd/display: [FW Promotion] Release 0.0.98
drm/amd/display: Undo ODM combine
drm/amd/display: Add reg defs for DCN303
drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
drm/amd/display: Set optimize_pwr_state for DCN31
...
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c')
-rw-r--r-- | drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 73 |
1 files changed, 56 insertions, 17 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 75dad0214dc7..91e6e87562ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -867,9 +867,9 @@ static int amdgpu_ras_enable_all_features(struct amdgpu_device *adev, /* feature ctl end */ -void amdgpu_ras_mca_query_error_status(struct amdgpu_device *adev, - struct ras_common_if *ras_block, - struct ras_err_data *err_data) +static void amdgpu_ras_mca_query_error_status(struct amdgpu_device *adev, + struct ras_common_if *ras_block, + struct ras_err_data *err_data) { switch (ras_block->sub_block_index) { case AMDGPU_RAS_MCA_BLOCK__MP0: @@ -892,6 +892,38 @@ void amdgpu_ras_mca_query_error_status(struct amdgpu_device *adev, } } +static void amdgpu_ras_get_ecc_info(struct amdgpu_device *adev, struct ras_err_data *err_data) +{ + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); + int ret = 0; + + /* + * choosing right query method according to + * whether smu support query error information + */ + ret = smu_get_ecc_info(&adev->smu, (void *)&(ras->umc_ecc)); + if (ret == -EOPNOTSUPP) { + if (adev->umc.ras_funcs && + adev->umc.ras_funcs->query_ras_error_count) + adev->umc.ras_funcs->query_ras_error_count(adev, err_data); + + /* umc query_ras_error_address is also responsible for clearing + * error status + */ + if (adev->umc.ras_funcs && + adev->umc.ras_funcs->query_ras_error_address) + adev->umc.ras_funcs->query_ras_error_address(adev, err_data); + } else if (!ret) { + if (adev->umc.ras_funcs && + adev->umc.ras_funcs->ecc_info_query_ras_error_count) + adev->umc.ras_funcs->ecc_info_query_ras_error_count(adev, err_data); + + if (adev->umc.ras_funcs && + adev->umc.ras_funcs->ecc_info_query_ras_error_address) + adev->umc.ras_funcs->ecc_info_query_ras_error_address(adev, err_data); + } +} + /* query/inject/cure begin */ int amdgpu_ras_query_error_status(struct amdgpu_device *adev, struct ras_query_if *info) @@ -905,15 +937,7 @@ int amdgpu_ras_query_error_status(struct amdgpu_device *adev, switch (info->head.block) { case AMDGPU_RAS_BLOCK__UMC: - if (adev->umc.ras_funcs && - adev->umc.ras_funcs->query_ras_error_count) - adev->umc.ras_funcs->query_ras_error_count(adev, &err_data); - /* umc query_ras_error_address is also responsible for clearing - * error status - */ - if (adev->umc.ras_funcs && - adev->umc.ras_funcs->query_ras_error_address) - adev->umc.ras_funcs->query_ras_error_address(adev, &err_data); + amdgpu_ras_get_ecc_info(adev, &err_data); break; case AMDGPU_RAS_BLOCK__SDMA: if (adev->sdma.funcs->query_ras_error_count) { @@ -1137,9 +1161,9 @@ int amdgpu_ras_error_inject(struct amdgpu_device *adev, /** * amdgpu_ras_query_error_count -- Get error counts of all IPs - * adev: pointer to AMD GPU device - * ce_count: pointer to an integer to be set to the count of correctible errors. - * ue_count: pointer to an integer to be set to the count of uncorrectible + * @adev: pointer to AMD GPU device + * @ce_count: pointer to an integer to be set to the count of correctible errors. + * @ue_count: pointer to an integer to be set to the count of uncorrectible * errors. * * If set, @ce_count or @ue_count, count and return the corresponding @@ -1723,6 +1747,16 @@ static void amdgpu_ras_log_on_err_counter(struct amdgpu_device *adev) if (info.head.block == AMDGPU_RAS_BLOCK__PCIE_BIF) continue; + /* + * this is a workaround for aldebaran, skip send msg to + * smu to get ecc_info table due to smu handle get ecc + * info table failed temporarily. + * should be removed until smu fix handle ecc_info table. + */ + if ((info.head.block == AMDGPU_RAS_BLOCK__UMC) && + (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2))) + continue; + amdgpu_ras_query_error_status(adev, &info); } } @@ -1935,9 +1969,11 @@ int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev) if (!con || !con->eh_data) return 0; + mutex_lock(&con->recovery_lock); control = &con->eeprom_control; data = con->eh_data; save_count = data->count - control->ras_num_recs; + mutex_unlock(&con->recovery_lock); /* only new entries are saved */ if (save_count > 0) { if (amdgpu_ras_eeprom_append(control, @@ -2336,7 +2372,11 @@ int amdgpu_ras_init(struct amdgpu_device *adev) } /* Init poison supported flag, the default value is false */ - if (adev->df.funcs && + if (adev->gmc.xgmi.connected_to_cpu) { + /* enabled by default when GPU is connected to CPU */ + con->poison_supported = true; + } + else if (adev->df.funcs && adev->df.funcs->query_ras_poison_mode && adev->umc.ras_funcs && adev->umc.ras_funcs->query_ras_poison_mode) { @@ -2477,7 +2517,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device *adev, amdgpu_ras_sysfs_remove(adev, ras_block); if (ih_info->cb) amdgpu_ras_interrupt_remove_handler(adev, ih_info); - amdgpu_ras_feature_enable(adev, ras_block, 0); } /* do some init work after IP late init as dependence. |