Merge tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie: "There is a new Qualcomm accel driver for their QAIC, dma-fence got a deadline feature added, lots of refactoring around fbdev emulation, and the usual pre-release hw enablements from AMD and Intel and fixes everywhere. New drivers: - add QAIC acceleration driver dma-buf: - constify kobj_type structs - Reject prime DMA-Buf attachment if get_sg_table is missing. fbdev: - cmdline parser fixes - implement fbdev emulation for GEM DMA drivers - always use shadow buffer in fbdev emulation helpers dma-fence: - add deadline hint to fences - signal private stub fence core: - improve DisplayID 2.0 and EDID parsing - add gem eviction function + callback - prep to convert shmem helper to GEM resv lock - move suballocator from radeon/amdgpu to core for Xe - HPD polling fixes - Documentation improvements - Add atomic enable_plane callback - use tgid instead of pid for client tracking - DP: Add SDP Error Detection Configuration Register - Add prime import/export to vram-helper - use pci aperture helpers in more drivers panel: - Radxa 8/10HD support - Samsung AMD495QA01 support - Elida KD50T048A - Sony TD4353 - Novatek NT36523 - STARRY 2081101QFH032011-53G - B133UAN01.0 - AUO NE135FBM-N41 i915: - More MTL enabling - fix s/r problems with MEI/PXP - Implement fb_dirty for PSR,FBC,DRRS fixes - Fix eDP+DSI dual panel systems - Fix issue #6333: "list_add corruption" and full system lockup from performance monitoring - Don't use stolen memory or BAR for ring buffers on LLC platforms - Make sure DSM size has correct 1MiB granularity on Gen12+ - Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+ - Add engine TLB invalidation for Meteorlake - Fix GSC races on driver load/unload on Meteorlake+ - Make kobj_type structures constant - Move fd_install after last use of fence - wm/vblank refactoring - display code refactoring - Create GSC submission targeting HDCP and PXP usages on MTL+ - Enable HDCP2.x via GSC CS - Fix context runtime accounting on sysfs fdinfo for heavy workloads - Use i915 instead of dev_priv insied the file_priv structure - Replace fake flex-array with flexible-array member amdgpu: - Make kobj structures const - Generalize dmabuf import to work with KFD - Add capped/uncapped workload handling for supported APUs - Expose additional memory stats via fdinfo - Register vga_switcheroo for apple-gmux - Initial NBIO7.9, GC 9.4.3, GFXHUB 1.2, MMHUB 1.8 support - Initial DC FAM infrastructure - Link DC backlight to connector device rather than PCI device - Add sysfs nodes for secondary VCN clocks amdkfd: - Make kobj structures const - Support for exporting buffers via dmabuf - Multi-VMA page migration fixes - initial GC 9.4.3 support radeon: - iMac fix - convert to client based fbdev emulation habanalabs: - Add opcodes to the CS ioctl to allow user to stall/resume specific engines inside Gaudi2. - INFO ioctl the amount of device memory that the driver and f/w reserve for themselves. - INFO ioctl a bit-mask of the available rotator engines - INFO ioctl the register's address of the f/w that should be used to trigger interrupts - INFO ioctl two new opcodes to fetch information on h/w and f/w events - Enable graceful reset mechanism for compute-reset. - Align to the latest firmware specs. - Enforce the release order of the compute device and dma-buf. msm: - UBWC decoder programming rework - SM8550, SM8450 bindings update - uapi C++ fix - a3xx and a4xx devfreq support - GPU and GEM updates to avoid allocations which could trigger reclaim (shrinker) in fence signaling path - dma-fence deadline hint support and wait-boost - a640/650 speed bin support cirrus: - convert to regular atomic helpers - add damage clipping mediatek: - 10-bit overlay support - mt8195 support - Only trigger DRM HPD events if bridge is attached - Change the aux retries times when receiving AUX_DEFER rockchip: - add 4K support vc4: - use drm_gem_objects virtio: - allow KMS support to be disabled - add damage clipping vmwgfx: - buffer object lifetime fixes exynos: - move MIPI DSI driver to drm bridge for iMX sharing - use kernel fbdev emulation panfrost: - add support for mali MT81xx devices - add speed binning support lima: - add usage stats tegra: - fbdev client conversion vkms: - Add primary plane positioning support" * tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm: (1495 commits) drm/i915/dp_mst: Fix active port PLL selection for secondary MST streams drm/exynos: Implement fbdev emulation as in-kernel client drm/exynos: Initialize fbdev DRM client drm/exynos: Remove fb_helper from struct exynos_drm_private drm/exynos: Remove struct exynos_drm_fbdev drm/exynos: Remove exynos_gem from struct exynos_drm_fbdev drm/i915: Fix memory leaks in i915 selftests drm/i915: Make intel_get_crtc_new_encoder() less oopsy drm/i915/gt: Avoid out-of-bounds access when loading HuC drm/amdgpu: add some basic elements for multiple XCD case drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4) Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV" drm/amdgpu: add common ip block for GC 9.4.3 drm/amd/display: Add logging when DP link training Clock recovery is Successful drm/amdgpu: add common early init support for GC 9.4.3 drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3 drm/amd/display: Add logging when setting DP sink power state fails drm/amdkfd: Add gfx_target_version for GC 9.4.3 drm/amdkfd: Enable HW_UPDATE_RPTR on GC 9.4.3 drm/amdgpu: reserve the old gc_11_0_*_mes.bin ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2023-04-26 02:12:15 +0300
committer: Linus Torvalds <torvalds@linux-foundation.org> 2023-04-26 02:12:15 +0300
commit: c8cc58e289ed3b5bc50258f52776cf3dfa3bad66 (patch)
tree: fab95a9e92dd1b7ddec386294365ebd2ba130ec3 /drivers/accel/habanalabs/common/memory.c
parent: 736b378b29d89c8c3567fa4b2e948be5568aebb8 (diff)
parent: 289af45508ca890585f329376d16e08f41f75bd5 (diff)
download: linux-c8cc58e289ed3b5bc50258f52776cf3dfa3bad66.tar.xz
1 files changed, 79 insertions, 65 deletions
diff --git a/drivers/accel/habanalabs/common/memory.c b/drivers/accel/habanalabs/common/memory.c
index 761a47e89b00..a7b6a273ce21 100644
--- a/drivers/accel/habanalabs/common/memory.c
+++ b/drivers/accel/habanalabs/common/memory.c
@@ -235,10 +235,8 @@ static int dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size,
 	}
 
 	rc = hl_pin_host_memory(hdev, addr, size, userptr);
-	if (rc) {
-		dev_err(hdev->dev, "Failed to pin host memory\n");
+	if (rc)
 		goto pin_err;
-	}
 
 	userptr->dma_mapped = true;
 	userptr->dir = DMA_BIDIRECTIONAL;
@@ -607,6 +605,7 @@ static u64 get_va_block(struct hl_device *hdev,
 	bool is_align_pow_2  = is_power_of_2(va_range->page_size);
 	bool is_hint_dram_addr = hl_is_dram_va(hdev, hint_addr);
 	bool force_hint = flags & HL_MEM_FORCE_HINT;
+	int rc;
 
 	if (is_align_pow_2)
 		align_mask = ~((u64)va_block_align - 1);
@@ -724,9 +723,13 @@ static u64 get_va_block(struct hl_device *hdev,
 		kfree(new_va_block);
 	}
 
-	if (add_prev)
-		add_va_block_locked(hdev, &va_range->list, prev_start,
-				prev_end);
+	if (add_prev) {
+		rc = add_va_block_locked(hdev, &va_range->list, prev_start, prev_end);
+		if (rc) {
+			reserved_valid_start = 0;
+			goto out;
+		}
+	}
 
 	print_va_list_locked(hdev, &va_range->list);
 out:
@@ -1097,10 +1100,8 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device
 			huge_page_size = hdev->asic_prop.pmmu_huge.page_size;
 
 		rc = dma_map_host_va(hdev, addr, size, &userptr);
-		if (rc) {
-			dev_err(hdev->dev, "failed to get userptr from va\n");
+		if (rc)
 			return rc;
-		}
 
 		rc = init_phys_pg_pack_from_userptr(ctx, userptr,
 				&phys_pg_pack, false);
@@ -1270,6 +1271,18 @@ init_page_pack_err:
 	return rc;
 }
 
+/* Should be called while the context's mem_hash_lock is taken */
+static struct hl_vm_hash_node *get_vm_hash_node_locked(struct hl_ctx *ctx, u64 vaddr)
+{
+	struct hl_vm_hash_node *hnode;
+
+	hash_for_each_possible(ctx->mem_hash, hnode, node, vaddr)
+		if (vaddr == hnode->vaddr)
+			return hnode;
+
+	return NULL;
+}
+
 /**
  * unmap_device_va() - unmap the given device virtual address.
  * @ctx: pointer to the context structure.
@@ -1285,10 +1298,10 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 {
 	struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
 	u64 vaddr = args->unmap.device_virt_addr;
-	struct hl_vm_hash_node *hnode = NULL;
 	struct asic_fixed_properties *prop;
 	struct hl_device *hdev = ctx->hdev;
 	struct hl_userptr *userptr = NULL;
+	struct hl_vm_hash_node *hnode;
 	struct hl_va_range *va_range;
 	enum vm_type *vm_type;
 	bool is_userptr;
@@ -1298,15 +1311,10 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 
 	/* protect from double entrance */
 	mutex_lock(&ctx->mem_hash_lock);
-	hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr)
-		if (vaddr == hnode->vaddr)
-			break;
-
+	hnode = get_vm_hash_node_locked(ctx, vaddr);
 	if (!hnode) {
 		mutex_unlock(&ctx->mem_hash_lock);
-		dev_err(hdev->dev,
-			"unmap failed, no mem hnode for vaddr 0x%llx\n",
-			vaddr);
+		dev_err(hdev->dev, "unmap failed, no mem hnode for vaddr 0x%llx\n", vaddr);
 		return -EINVAL;
 	}
 
@@ -1779,6 +1787,44 @@ static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
 	kfree(sgt);
 }
 
+static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr)
+{
+	struct hl_device *hdev = ctx->hdev;
+	struct hl_vm_hash_node *hnode;
+
+	/* get the memory handle */
+	mutex_lock(&ctx->mem_hash_lock);
+	hnode = get_vm_hash_node_locked(ctx, addr);
+	if (!hnode) {
+		mutex_unlock(&ctx->mem_hash_lock);
+		dev_dbg(hdev->dev, "map address %#llx not found\n", addr);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (upper_32_bits(hnode->handle)) {
+		mutex_unlock(&ctx->mem_hash_lock);
+		dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n",
+				hnode->handle, addr);
+		return ERR_PTR(-EINVAL);
+	}
+
+	/*
+	 * node found, increase export count so this memory cannot be unmapped
+	 * and the hash node cannot be deleted.
+	 */
+	hnode->export_cnt++;
+	mutex_unlock(&ctx->mem_hash_lock);
+
+	return hnode;
+}
+
+static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode)
+{
+	mutex_lock(&ctx->mem_hash_lock);
+	hnode->export_cnt--;
+	mutex_unlock(&ctx->mem_hash_lock);
+}
+
 static void hl_release_dmabuf(struct dma_buf *dmabuf)
 {
 	struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv;
@@ -1789,13 +1835,15 @@ static void hl_release_dmabuf(struct dma_buf *dmabuf)
 
 	ctx = hl_dmabuf->ctx;
 
-	if (hl_dmabuf->memhash_hnode) {
-		mutex_lock(&ctx->mem_hash_lock);
-		hl_dmabuf->memhash_hnode->export_cnt--;
-		mutex_unlock(&ctx->mem_hash_lock);
-	}
+	if (hl_dmabuf->memhash_hnode)
+		memhash_node_export_put(ctx, hl_dmabuf->memhash_hnode);
 
+	atomic_dec(&ctx->hdev->dmabuf_export_cnt);
 	hl_ctx_put(ctx);
+
+	/* Paired with get_file() in export_dmabuf() */
+	fput(ctx->hpriv->filp);
+
 	kfree(hl_dmabuf);
 }
 
@@ -1834,6 +1882,13 @@ static int export_dmabuf(struct hl_ctx *ctx,
 
 	hl_dmabuf->ctx = ctx;
 	hl_ctx_get(hl_dmabuf->ctx);
+	atomic_inc(&ctx->hdev->dmabuf_export_cnt);
+
+	/* Get compute device file to enforce release order, such that all exported dma-buf will be
+	 * released first and only then the compute device.
+	 * Paired with fput() in hl_release_dmabuf().
+	 */
+	get_file(ctx->hpriv->filp);
 
 	*dmabuf_fd = fd;
 
@@ -1933,47 +1988,6 @@ static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 s
 	return 0;
 }
 
-static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr)
-{
-	struct hl_device *hdev = ctx->hdev;
-	struct hl_vm_hash_node *hnode;
-
-	/* get the memory handle */
-	mutex_lock(&ctx->mem_hash_lock);
-	hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr)
-		if (addr == hnode->vaddr)
-			break;
-
-	if (!hnode) {
-		mutex_unlock(&ctx->mem_hash_lock);
-		dev_dbg(hdev->dev, "map address %#llx not found\n", addr);
-		return ERR_PTR(-EINVAL);
-	}
-
-	if (upper_32_bits(hnode->handle)) {
-		mutex_unlock(&ctx->mem_hash_lock);
-		dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n",
-				hnode->handle, addr);
-		return ERR_PTR(-EINVAL);
-	}
-
-	/*
-	 * node found, increase export count so this memory cannot be unmapped
-	 * and the hash node cannot be deleted.
-	 */
-	hnode->export_cnt++;
-	mutex_unlock(&ctx->mem_hash_lock);
-
-	return hnode;
-}
-
-static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode)
-{
-	mutex_lock(&ctx->mem_hash_lock);
-	hnode->export_cnt--;
-	mutex_unlock(&ctx->mem_hash_lock);
-}
-
 static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev,
 							struct hl_vm_hash_node *hnode)
 {
@@ -2221,11 +2235,11 @@ static struct hl_mmap_mem_buf_behavior hl_ts_behavior = {
  * allocate_timestamps_buffers() - allocate timestamps buffers
  * This function will allocate ts buffer that will later on be mapped to the user
  * in order to be able to read the timestamp.
- * in additon it'll allocate an extra buffer for registration management.
+ * in addition it'll allocate an extra buffer for registration management.
  * since we cannot fail during registration for out-of-memory situation, so
  * we'll prepare a pool which will be used as user interrupt nodes and instead
  * of dynamically allocating nodes while registration we'll pick the node from
- * this pool. in addtion it'll add node to the mapping hash which will be used
+ * this pool. in addition it'll add node to the mapping hash which will be used
  * to map user ts buffer to the internal kernel ts buffer.
  * @hpriv: pointer to the private data of the fd
  * @args: ioctl input
author	Linus Torvalds <torvalds@linux-foundation.org>	2023-04-26 02:12:15 +0300
committer	Linus Torvalds <torvalds@linux-foundation.org>	2023-04-26 02:12:15 +0300
commit	c8cc58e289ed3b5bc50258f52776cf3dfa3bad66 (patch)
tree	fab95a9e92dd1b7ddec386294365ebd2ba130ec3 /drivers/accel/habanalabs/common/memory.c
parent	736b378b29d89c8c3567fa4b2e948be5568aebb8 (diff)
parent	289af45508ca890585f329376d16e08f41f75bd5 (diff)
download	linux-c8cc58e289ed3b5bc50258f52776cf3dfa3bad66.tar.xz