diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-07-01 22:53:43 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-07-01 22:53:43 +0300 |
commit | e058a84bfddc42ba356a2316f2cf1141974625c9 (patch) | |
tree | e6a02dd913e83f44ea9f5a779f9b9bd56d06a9e3 /drivers/gpu/drm/msm/adreno/a6xx_gpu.c | |
parent | c288d9cd710433e5991d58a0764c4d08a933b871 (diff) | |
parent | 8a02ea42bc1d4c448caf1bab0e05899dad503f74 (diff) | |
download | linux-e058a84bfddc42ba356a2316f2cf1141974625c9.tar.xz |
Merge tag 'drm-next-2021-07-01' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- AMD enables two more GPUs, with resulting header files
- i915 has started to move to TTM for discrete GPU and enable DG1
discrete GPU support (not by default yet)
- new HyperV drm driver
- vmwgfx adds arm64 support
- TTM refactoring ongoing
- 16bpc display support for AMD hw
Otherwise it's just the usual insane amounts of work all over the
place in lots of drivers and the core, as mostly summarised below:
Core:
- mark AGP ioctls as legacy
- disable force probing for non-master clients
- HDR metadata property helpers
- HDMI infoframe signal colorimetry support
- remove drm_device.pdev pointer
- remove DRM_KMS_FB_HELPER config option
- remove drm_pci_alloc/free
- drm_err_*/drm_dbg_* helpers
- use drm driver names for fbdev
- leaked DMA handle fix
- 16bpc fixed point format fourcc
- add prefetching memcpy for WC
- Documentation fixes
aperture:
- add aperture ownership helpers
dp:
- aux fixes
- downstream 0 port handling
- use extended base receiver capability DPCD
- Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec
- mst: use khz as link rate during init
- VCPI fixes for StarTech hub
ttm:
- provide tt_shrink file via debugfs
- warn about freeing pinned BOs
- fix swapping error handling
- move page alignment into BO
- cleanup ttm_agp_backend
- add ttm_sys_manager
- don't override vm_ops
- ttm_bo_mmap removed
- make ttm_resource base of all managers
- remove VM_MIXEDMAP usage
panel:
- sysfs_emit support
- simple: runtime PM support
- simple: power up panel when reading EDID + caching
bridge:
- MHDP8546: HDCP support + DT bindings
- MHDP8546: Register DP AUX channel with userspace
- TI SN65DSI83 + SN65DSI84: add driver
- Sil8620: Fix module dependencies
- dw-hdmi: make CEC driver loading optional
- Ti-sn65dsi86: refclk fixes, subdrivers, runtime pm
- It66121: Add driver + DT bindings
- Adv7511: Support I2S IEC958 encoding
- Anx7625: fix power-on delay
- Nwi-dsi: Modesetting fixes; Cleanups
- lt6911: add missing MODULE_DEVICE_TABLE
- cdns: fix PM reference leak
hyperv:
- add new DRM driver for HyperV graphics
efifb:
- non-PCI device handling fixes
i915:
- refactor IP/device versioning
- XeLPD Display IP preperation work
- ADL-P enablement patches
- DG1 uAPI behind BROKEN
- disable mmap ioctl for discerte GPUs
- start enabling HuC loading for Gen12+
- major GuC backend rework for new platforms
- initial TTM support for Discrete GPUs
- locking rework for TTM prep
- use correct max source link rate for eDP
- %p4cc format printing
- GLK display fixes
- VLV DSI panel power fixes
- PSR2 disabled for RKL and ADL-S
- ACPI _DSM invalid access fixed
- DMC FW path abstraction
- ADL-S PCI ID update
- uAPI headers converted to kerneldoc
- initial LMEM support for DG1
- x86/gpu: add Jasperlake to gen11 early quirks
amdgpu:
- Aldebaran updates + initial SR-IOV
- new GPU: Beige Goby and Yellow Carp support
- more LTTPR display work
- Vangogh updates
- SDMA 5.x GCR fixes
- PCIe ASPM support
- Renoir TMZ enablement
- initial multiple eDP panel support
- use fdinfo to track devices/process info
- pin/unpin TTM fixes
- free resource on fence usage query
- fix fence calculation
- fix hotunplug/suspend issues
- GC/MM register access macro cleanup for SR-IOV
- W=1 fixes
- ACPI ATCS/ATIF handling rework
- 16bpc fixed point format support
- Initial smartshift support
- RV/PCO power tuning fixes
- new INFO query for additional vbios info
amdkfd:
- SR-IOV aldebaran support
- HMM SVM support
radeon:
- SMU regression fixes
- Oland flickering fix
vmwgfx:
- enable console with fbdev emulation
- fix cpu updates of coherent multisample surfaces
- remove reservation semaphore
- add initial SVGA3 support
- support arm64
msm:
- devcoredump support for display errors
- dpu/dsi: yaml bindings conversion
- mdp5: alpha/blend_mode/zpos support
- a6xx: cached coherent buffer support
- gpu iova fault improvement
- a660 support
rockchip:
- RK3036 win1 scaling support
- RK3066/3188 missing register support
- RK3036/3066/3126/3188 alpha support
mediatek:
- MT8167 HDMI support
- MT8183 DPI dual edge support
tegra:
- fixed YUV support/scaling on Tegra186+
ast:
- use pcim_iomap
- fix DP501 EDID
bochs:
- screen blanking support
etnaviv:
- export more GPU ID values to userspace
- add HWDB entry for GPU on i.MX8MP
- rework linear window calcs
exynos:
- pm runtime changes
imx:
- Annotate dma_fence critical section
- fix PRG modifiers after drmm conversion
- Add 8 pixel alignment fix for 1366x768
- fix YUV advertising
- add color properties
ingenic:
- IPU planes fix
panfrost:
- Mediatek MT8183 support + DT bindings
- export AFBC_FEATURES register to userspace
simpledrm:
- %pr for printing resources
nouveau:
- pin/unpin TTM fixes
qxl:
- unpin shadow BO
virtio:
- create dumb BOs as guest blob
vkms:
- drmm_universal_plane_alloc
- add XRGB plane composition
- overlay support"
* tag 'drm-next-2021-07-01' of git://anongit.freedesktop.org/drm/drm: (1570 commits)
drm/i915: Reinstate the mmap ioctl for some platforms
drm/i915/dsc: abstract helpers to get bigjoiner primary/secondary crtc
Revert "drm/msm/mdp5: provide dynamic bandwidth management"
drm/msm/mdp5: provide dynamic bandwidth management
drm/msm/mdp5: add perf blocks for holding fudge factors
drm/msm/mdp5: switch to standard zpos property
drm/msm/mdp5: add support for alpha/blend_mode properties
drm/msm/mdp5: use drm_plane_state for pixel blend mode
drm/msm/mdp5: use drm_plane_state for storing alpha value
drm/msm/mdp5: use drm atomic helpers to handle base drm plane state
drm/msm/dsi: do not enable PHYs when called for the slave DSI interface
drm/msm: Add debugfs to trigger shrinker
drm/msm/dpu: Avoid ABBA deadlock between IRQ modules
drm/msm: devcoredump iommu fault support
iommu/arm-smmu-qcom: Add stall support
drm/msm: Improve the a6xx page fault handler
iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info
iommu/arm-smmu: Add support for driver IOMMU fault handlers
drm/msm: export hangcheck_period in debugfs
drm/msm/a6xx: add support for Adreno 660 GPU
...
Diffstat (limited to 'drivers/gpu/drm/msm/adreno/a6xx_gpu.c')
-rw-r--r-- | drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 305 |
1 files changed, 258 insertions, 47 deletions
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index f6c1b62b901e..9c5e4618aa0a 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -149,7 +149,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx); - get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO, + get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0), rbmemptr_stats(ring, index, cpcycles_start)); /* @@ -185,7 +185,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) } } - get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO, + get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0), rbmemptr_stats(ring, index, cpcycles_end)); get_stats_counter(ring, REG_A6XX_CP_ALWAYS_ON_COUNTER_LO, rbmemptr_stats(ring, index, alwayson_end)); @@ -427,6 +427,59 @@ const struct adreno_reglist a650_hwcg[] = { {}, }; +const struct adreno_reglist a660_hwcg[] = { + {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x02222222}, + {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220}, + {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x00000080}, + {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0x0000F3CF}, + {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x22222222}, + {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x22222222}, + {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x22222222}, + {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x00022222}, + {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x11111111}, + {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x11111111}, + {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x11111111}, + {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x00011111}, + {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x77777777}, + {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x77777777}, + {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x77777777}, + {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x00077777}, + {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x22222222}, + {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x01002222}, + {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x00002220}, + {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040F00}, + {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x25222022}, + {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x00005555}, + {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x00000011}, + {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044}, + {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x04222222}, + {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x00002222}, + {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x00222222}, + {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x00000002}, + {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x00002222}, + {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00004000}, + {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x00002222}, + {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x00000200}, + {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x00000000}, + {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00000000}, + {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x00000000}, + {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004}, + {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x00000000}, + {REG_A6XX_RBBM_CLOCK_CNTL_TEX_FCHE, 0x00000222}, + {REG_A6XX_RBBM_CLOCK_DELAY_TEX_FCHE, 0x00000111}, + {REG_A6XX_RBBM_CLOCK_HYST_TEX_FCHE, 0x00000000}, + {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x22222222}, + {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x00000004}, + {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x00000002}, + {REG_A6XX_RBBM_ISDB_CNT, 0x00000182}, + {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x00000000}, + {REG_A6XX_RBBM_SP_HYST_CNT, 0x00000000}, + {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x00000222}, + {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x00000111}, + {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x00000555}, + {}, +}; + static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); @@ -541,6 +594,51 @@ static const u32 a650_protect[] = { A6XX_PROTECT_NORDWR(0x1f8c0, 0x0000), /* note: infinite range */ }; +/* These are for a635 and a660 */ +static const u32 a660_protect[] = { + A6XX_PROTECT_RDONLY(0x00000, 0x04ff), + A6XX_PROTECT_RDONLY(0x00501, 0x0005), + A6XX_PROTECT_RDONLY(0x0050b, 0x02f4), + A6XX_PROTECT_NORDWR(0x0050e, 0x0000), + A6XX_PROTECT_NORDWR(0x00510, 0x0000), + A6XX_PROTECT_NORDWR(0x00534, 0x0000), + A6XX_PROTECT_NORDWR(0x00800, 0x0082), + A6XX_PROTECT_NORDWR(0x008a0, 0x0008), + A6XX_PROTECT_NORDWR(0x008ab, 0x0024), + A6XX_PROTECT_RDONLY(0x008de, 0x00ae), + A6XX_PROTECT_NORDWR(0x00900, 0x004d), + A6XX_PROTECT_NORDWR(0x0098d, 0x0272), + A6XX_PROTECT_NORDWR(0x00e00, 0x0001), + A6XX_PROTECT_NORDWR(0x00e03, 0x000c), + A6XX_PROTECT_NORDWR(0x03c00, 0x00c3), + A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff), + A6XX_PROTECT_NORDWR(0x08630, 0x01cf), + A6XX_PROTECT_NORDWR(0x08e00, 0x0000), + A6XX_PROTECT_NORDWR(0x08e08, 0x0000), + A6XX_PROTECT_NORDWR(0x08e50, 0x001f), + A6XX_PROTECT_NORDWR(0x08e80, 0x027f), + A6XX_PROTECT_NORDWR(0x09624, 0x01db), + A6XX_PROTECT_NORDWR(0x09e60, 0x0011), + A6XX_PROTECT_NORDWR(0x09e78, 0x0187), + A6XX_PROTECT_NORDWR(0x0a630, 0x01cf), + A6XX_PROTECT_NORDWR(0x0ae02, 0x0000), + A6XX_PROTECT_NORDWR(0x0ae50, 0x012f), + A6XX_PROTECT_NORDWR(0x0b604, 0x0000), + A6XX_PROTECT_NORDWR(0x0b608, 0x0006), + A6XX_PROTECT_NORDWR(0x0be02, 0x0001), + A6XX_PROTECT_NORDWR(0x0be20, 0x015f), + A6XX_PROTECT_NORDWR(0x0d000, 0x05ff), + A6XX_PROTECT_NORDWR(0x0f000, 0x0bff), + A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff), + A6XX_PROTECT_NORDWR(0x18400, 0x1fff), + A6XX_PROTECT_NORDWR(0x1a400, 0x1fff), + A6XX_PROTECT_NORDWR(0x1f400, 0x0443), + A6XX_PROTECT_RDONLY(0x1f844, 0x007b), + A6XX_PROTECT_NORDWR(0x1f860, 0x0000), + A6XX_PROTECT_NORDWR(0x1f887, 0x001b), + A6XX_PROTECT_NORDWR(0x1f8c0, 0x0000), /* note: infinite range */ +}; + static void a6xx_set_cp_protect(struct msm_gpu *gpu) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); @@ -554,6 +652,10 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu) regs = a650_protect; count = ARRAY_SIZE(a650_protect); count_max = 48; + } else if (adreno_is_a660(adreno_gpu)) { + regs = a660_protect; + count = ARRAY_SIZE(a660_protect); + count_max = 48; } /* @@ -584,7 +686,7 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu) if (adreno_is_a640(adreno_gpu)) amsbc = 1; - if (adreno_is_a650(adreno_gpu)) { + if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu)) { /* TODO: get ddr type from bootloader and use 2 for LPDDR4 */ lower_bit = 3; amsbc = 1; @@ -648,6 +750,11 @@ static bool a6xx_ucode_check_version(struct a6xx_gpu *a6xx_gpu, * Targets up to a640 (a618, a630 and a640) need to check for a * microcode version that is patched to support the whereami opcode or * one that is new enough to include it by default. + * + * a650 tier targets don't need whereami but still need to be + * equal to or newer than 0.95 for other security fixes + * + * a660 targets have all the critical security fixes from the start */ if (adreno_is_a618(adreno_gpu) || adreno_is_a630(adreno_gpu) || adreno_is_a640(adreno_gpu)) { @@ -671,27 +778,20 @@ static bool a6xx_ucode_check_version(struct a6xx_gpu *a6xx_gpu, DRM_DEV_ERROR(&gpu->pdev->dev, "a630 SQE ucode is too old. Have version %x need at least %x\n", buf[0] & 0xfff, 0x190); - } else { - /* - * a650 tier targets don't need whereami but still need to be - * equal to or newer than 0.95 for other security fixes - */ - if (adreno_is_a650(adreno_gpu)) { - if ((buf[0] & 0xfff) >= 0x095) { - ret = true; - goto out; - } - - DRM_DEV_ERROR(&gpu->pdev->dev, - "a650 SQE ucode is too old. Have version %x need at least %x\n", - buf[0] & 0xfff, 0x095); + } else if (adreno_is_a650(adreno_gpu)) { + if ((buf[0] & 0xfff) >= 0x095) { + ret = true; + goto out; } - /* - * When a660 is added those targets should return true here - * since those have all the critical security fixes built in - * from the start - */ + DRM_DEV_ERROR(&gpu->pdev->dev, + "a650 SQE ucode is too old. Have version %x need at least %x\n", + buf[0] & 0xfff, 0x095); + } else if (adreno_is_a660(adreno_gpu)) { + ret = true; + } else { + DRM_DEV_ERROR(&gpu->pdev->dev, + "unknown GPU, add it to a6xx_ucode_check_version()!!\n"); } out: msm_gem_put_vaddr(obj); @@ -727,8 +827,8 @@ static int a6xx_ucode_init(struct msm_gpu *gpu) } } - gpu_write64(gpu, REG_A6XX_CP_SQE_INSTR_BASE_LO, - REG_A6XX_CP_SQE_INSTR_BASE_HI, a6xx_gpu->sqe_iova); + gpu_write64(gpu, REG_A6XX_CP_SQE_INSTR_BASE, + REG_A6XX_CP_SQE_INSTR_BASE+1, a6xx_gpu->sqe_iova); return 0; } @@ -797,7 +897,7 @@ static int a6xx_hw_init(struct msm_gpu *gpu) a6xx_set_hwcg(gpu, true); /* VBIF/GBIF start*/ - if (adreno_is_a640(adreno_gpu) || adreno_is_a650(adreno_gpu)) { + if (adreno_is_a640(adreno_gpu) || adreno_is_a650_family(adreno_gpu)) { gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE0, 0x00071620); gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE1, 0x00071620); gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE2, 0x00071620); @@ -822,7 +922,7 @@ static int a6xx_hw_init(struct msm_gpu *gpu) gpu_write(gpu, REG_A6XX_UCHE_WRITE_THRU_BASE_LO, 0xfffff000); gpu_write(gpu, REG_A6XX_UCHE_WRITE_THRU_BASE_HI, 0x0001ffff); - if (!adreno_is_a650(adreno_gpu)) { + if (!adreno_is_a650_family(adreno_gpu)) { /* Set the GMEM VA range [0x100000:0x100000 + gpu->gmem - 1] */ gpu_write64(gpu, REG_A6XX_UCHE_GMEM_RANGE_MIN_LO, REG_A6XX_UCHE_GMEM_RANGE_MIN_HI, 0x00100000); @@ -835,22 +935,27 @@ static int a6xx_hw_init(struct msm_gpu *gpu) gpu_write(gpu, REG_A6XX_UCHE_FILTER_CNTL, 0x804); gpu_write(gpu, REG_A6XX_UCHE_CACHE_WAYS, 0x4); - if (adreno_is_a640(adreno_gpu) || adreno_is_a650(adreno_gpu)) + if (adreno_is_a640(adreno_gpu) || adreno_is_a650_family(adreno_gpu)) gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x02000140); else gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x010000c0); gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c); + if (adreno_is_a660(adreno_gpu)) + gpu_write(gpu, REG_A6XX_CP_LPAC_PROG_FIFO_SIZE, 0x00000020); + /* Setting the mem pool size */ gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128); - /* Setting the primFifo thresholds default values */ - if (adreno_is_a650(adreno_gpu)) - gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300000); + /* Setting the primFifo thresholds default values, + * and vccCacheSkipDis=1 bit (0x200) for A640 and newer + */ + if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu)) + gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200); else if (adreno_is_a640(adreno_gpu)) - gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200000); + gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200); else - gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, (0x300 << 11)); + gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00180000); /* Set the AHB default slave response to "ERROR" */ gpu_write(gpu, REG_A6XX_CP_AHB_CNTL, 0x1); @@ -859,7 +964,7 @@ static int a6xx_hw_init(struct msm_gpu *gpu) gpu_write(gpu, REG_A6XX_RBBM_PERFCTR_CNTL, 0x1); /* Select CP0 to always count cycles */ - gpu_write(gpu, REG_A6XX_CP_PERFCTR_CP_SEL_0, PERF_CP_ALWAYS_COUNT); + gpu_write(gpu, REG_A6XX_CP_PERFCTR_CP_SEL(0), PERF_CP_ALWAYS_COUNT); a6xx_set_ubwc_config(gpu); @@ -870,7 +975,7 @@ static int a6xx_hw_init(struct msm_gpu *gpu) gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1); /* Set weights for bicubic filtering */ - if (adreno_is_a650(adreno_gpu)) { + if (adreno_is_a650_family(adreno_gpu)) { gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0); gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1, 0x3fe05ff4); @@ -885,6 +990,13 @@ static int a6xx_hw_init(struct msm_gpu *gpu) /* Protect registers from the CP */ a6xx_set_cp_protect(gpu); + if (adreno_is_a660(adreno_gpu)) { + gpu_write(gpu, REG_A6XX_CP_CHICKEN_DBG, 0x1); + gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x0); + /* Set dualQ + disable afull for A660 GPU but not for A635 */ + gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x66906); + } + /* Enable expanded apriv for targets that support it */ if (gpu->hw_apriv) { gpu_write(gpu, REG_A6XX_CP_APRIV_CNTL, @@ -925,7 +1037,7 @@ static int a6xx_hw_init(struct msm_gpu *gpu) if (!a6xx_gpu->shadow_bo) { a6xx_gpu->shadow = msm_gem_kernel_new_locked(gpu->dev, sizeof(u32) * gpu->nr_rings, - MSM_BO_UNCACHED | MSM_BO_MAP_PRIV, + MSM_BO_WC | MSM_BO_MAP_PRIV, gpu->aspace, &a6xx_gpu->shadow_bo, &a6xx_gpu->shadow_iova); @@ -1032,18 +1144,113 @@ static void a6xx_recover(struct msm_gpu *gpu) msm_gpu_hw_init(gpu); } -static int a6xx_fault_handler(void *arg, unsigned long iova, int flags) +static const char *a6xx_uche_fault_block(struct msm_gpu *gpu, u32 mid) +{ + static const char *uche_clients[7] = { + "VFD", "SP", "VSC", "VPC", "HLSQ", "PC", "LRZ", + }; + u32 val; + + if (mid < 1 || mid > 3) + return "UNKNOWN"; + + /* + * The source of the data depends on the mid ID read from FSYNR1. + * and the client ID read from the UCHE block + */ + val = gpu_read(gpu, REG_A6XX_UCHE_CLIENT_PF); + + /* mid = 3 is most precise and refers to only one block per client */ + if (mid == 3) + return uche_clients[val & 7]; + + /* For mid=2 the source is TP or VFD except when the client id is 0 */ + if (mid == 2) + return ((val & 7) == 0) ? "TP" : "TP|VFD"; + + /* For mid=1 just return "UCHE" as a catchall for everything else */ + return "UCHE"; +} + +static const char *a6xx_fault_block(struct msm_gpu *gpu, u32 id) +{ + if (id == 0) + return "CP"; + else if (id == 4) + return "CCU"; + else if (id == 6) + return "CDP Prefetch"; + + return a6xx_uche_fault_block(gpu, id); +} + +#define ARM_SMMU_FSR_TF BIT(1) +#define ARM_SMMU_FSR_PF BIT(3) +#define ARM_SMMU_FSR_EF BIT(4) + +static int a6xx_fault_handler(void *arg, unsigned long iova, int flags, void *data) { struct msm_gpu *gpu = arg; + struct adreno_smmu_fault_info *info = data; + const char *type = "UNKNOWN"; + const char *block; + bool do_devcoredump = info && !READ_ONCE(gpu->crashstate); + + /* + * If we aren't going to be resuming later from fault_worker, then do + * it now. + */ + if (!do_devcoredump) { + gpu->aspace->mmu->funcs->resume_translation(gpu->aspace->mmu); + } - pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d (%u,%u,%u,%u)\n", + /* + * Print a default message if we couldn't get the data from the + * adreno-smmu-priv + */ + if (!info) { + pr_warn_ratelimited("*** gpu fault: iova=%.16lx flags=%d (%u,%u,%u,%u)\n", iova, flags, gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)), gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)), gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)), gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7))); - return -EFAULT; + return 0; + } + + if (info->fsr & ARM_SMMU_FSR_TF) + type = "TRANSLATION"; + else if (info->fsr & ARM_SMMU_FSR_PF) + type = "PERMISSION"; + else if (info->fsr & ARM_SMMU_FSR_EF) + type = "EXTERNAL"; + + block = a6xx_fault_block(gpu, info->fsynr1 & 0xff); + + pr_warn_ratelimited("*** gpu fault: ttbr0=%.16llx iova=%.16lx dir=%s type=%s source=%s (%u,%u,%u,%u)\n", + info->ttbr0, iova, + flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ", + type, block, + gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)), + gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)), + gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)), + gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7))); + + if (do_devcoredump) { + /* Turn off the hangcheck timer to keep it from bothering us */ + del_timer(&gpu->hangcheck_timer); + + gpu->fault_info.ttbr0 = info->ttbr0; + gpu->fault_info.iova = iova; + gpu->fault_info.flags = flags; + gpu->fault_info.type = type; + gpu->fault_info.block = block; + + kthread_queue_work(gpu->worker, &gpu->fault_work); + } + + return 0; } static void a6xx_cp_hw_err_irq(struct msm_gpu *gpu) @@ -1095,6 +1302,15 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu) struct msm_ringbuffer *ring = gpu->funcs->active_ring(gpu); /* + * If stalled on SMMU fault, we could trip the GPU's hang detection, + * but the fault handler will trigger the devcore dump, and we want + * to otherwise resume normally rather than killing the submit, so + * just bail. + */ + if (gpu_read(gpu, REG_A6XX_RBBM_STATUS3) & A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT) + return; + + /* * Force the GPU to stay on until after we finish * collecting information */ @@ -1339,9 +1555,6 @@ static void a6xx_destroy(struct msm_gpu *gpu) adreno_gpu_cleanup(adreno_gpu); - if (a6xx_gpu->opp_table) - dev_pm_opp_put_supported_hw(a6xx_gpu->opp_table); - kfree(a6xx_gpu); } @@ -1474,7 +1687,6 @@ static u32 fuse_to_supp_hw(struct device *dev, u32 revn, u32 fuse) static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu, u32 revn) { - struct opp_table *opp_table; u32 supp_hw = UINT_MAX; u16 speedbin; int ret; @@ -1497,11 +1709,10 @@ static int a6xx_set_supported_hw(struct device *dev, struct a6xx_gpu *a6xx_gpu, supp_hw = fuse_to_supp_hw(dev, revn, speedbin); done: - opp_table = dev_pm_opp_set_supported_hw(dev, &supp_hw, 1); - if (IS_ERR(opp_table)) - return PTR_ERR(opp_table); + ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1); + if (ret) + return ret; - a6xx_gpu->opp_table = opp_table; return 0; } @@ -1561,7 +1772,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev) */ info = adreno_info(config->rev); - if (info && info->revn == 650) + if (info && (info->revn == 650 || info->revn == 660)) adreno_gpu->base.hw_apriv = true; a6xx_llc_slices_init(pdev, a6xx_gpu); |