diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-26 02:12:15 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-04-26 02:12:15 +0300 |
commit | c8cc58e289ed3b5bc50258f52776cf3dfa3bad66 (patch) | |
tree | fab95a9e92dd1b7ddec386294365ebd2ba130ec3 /drivers/accel/habanalabs/common/memory.c | |
parent | 736b378b29d89c8c3567fa4b2e948be5568aebb8 (diff) | |
parent | 289af45508ca890585f329376d16e08f41f75bd5 (diff) | |
download | linux-c8cc58e289ed3b5bc50258f52776cf3dfa3bad66.tar.xz |
Merge tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"There is a new Qualcomm accel driver for their QAIC, dma-fence got a
deadline feature added, lots of refactoring around fbdev emulation,
and the usual pre-release hw enablements from AMD and Intel and fixes
everywhere.
New drivers:
- add QAIC acceleration driver
dma-buf:
- constify kobj_type structs
- Reject prime DMA-Buf attachment if get_sg_table is missing.
fbdev:
- cmdline parser fixes
- implement fbdev emulation for GEM DMA drivers
- always use shadow buffer in fbdev emulation helpers
dma-fence:
- add deadline hint to fences
- signal private stub fence
core:
- improve DisplayID 2.0 and EDID parsing
- add gem eviction function + callback
- prep to convert shmem helper to GEM resv lock
- move suballocator from radeon/amdgpu to core for Xe
- HPD polling fixes
- Documentation improvements
- Add atomic enable_plane callback
- use tgid instead of pid for client tracking
- DP: Add SDP Error Detection Configuration Register
- Add prime import/export to vram-helper
- use pci aperture helpers in more drivers
panel:
- Radxa 8/10HD support
- Samsung AMD495QA01 support
- Elida KD50T048A
- Sony TD4353
- Novatek NT36523
- STARRY 2081101QFH032011-53G
- B133UAN01.0
- AUO NE135FBM-N41
i915:
- More MTL enabling
- fix s/r problems with MEI/PXP
- Implement fb_dirty for PSR,FBC,DRRS fixes
- Fix eDP+DSI dual panel systems
- Fix issue #6333: "list_add corruption" and full system lockup from
performance monitoring
- Don't use stolen memory or BAR for ring buffers on LLC platforms
- Make sure DSM size has correct 1MiB granularity on Gen12+
- Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+
- Add engine TLB invalidation for Meteorlake
- Fix GSC races on driver load/unload on Meteorlake+
- Make kobj_type structures constant
- Move fd_install after last use of fence
- wm/vblank refactoring
- display code refactoring
- Create GSC submission targeting HDCP and PXP usages on MTL+
- Enable HDCP2.x via GSC CS
- Fix context runtime accounting on sysfs fdinfo for heavy workloads
- Use i915 instead of dev_priv insied the file_priv structure
- Replace fake flex-array with flexible-array member
amdgpu:
- Make kobj structures const
- Generalize dmabuf import to work with KFD
- Add capped/uncapped workload handling for supported APUs
- Expose additional memory stats via fdinfo
- Register vga_switcheroo for apple-gmux
- Initial NBIO7.9, GC 9.4.3, GFXHUB 1.2, MMHUB 1.8 support
- Initial DC FAM infrastructure
- Link DC backlight to connector device rather than PCI device
- Add sysfs nodes for secondary VCN clocks
amdkfd:
- Make kobj structures const
- Support for exporting buffers via dmabuf
- Multi-VMA page migration fixes
- initial GC 9.4.3 support
radeon:
- iMac fix
- convert to client based fbdev emulation
habanalabs:
- Add opcodes to the CS ioctl to allow user to stall/resume specific
engines inside Gaudi2.
- INFO ioctl the amount of device memory that the driver and f/w
reserve for themselves.
- INFO ioctl a bit-mask of the available rotator engines
- INFO ioctl the register's address of the f/w that should be used to
trigger interrupts
- INFO ioctl two new opcodes to fetch information on h/w and f/w
events
- Enable graceful reset mechanism for compute-reset.
- Align to the latest firmware specs.
- Enforce the release order of the compute device and dma-buf.
msm:
- UBWC decoder programming rework
- SM8550, SM8450 bindings update
- uapi C++ fix
- a3xx and a4xx devfreq support
- GPU and GEM updates to avoid allocations which could trigger
reclaim (shrinker) in fence signaling path
- dma-fence deadline hint support and wait-boost
- a640/650 speed bin support
cirrus:
- convert to regular atomic helpers
- add damage clipping
mediatek:
- 10-bit overlay support
- mt8195 support
- Only trigger DRM HPD events if bridge is attached
- Change the aux retries times when receiving AUX_DEFER
rockchip:
- add 4K support
vc4:
- use drm_gem_objects
virtio:
- allow KMS support to be disabled
- add damage clipping
vmwgfx:
- buffer object lifetime fixes
exynos:
- move MIPI DSI driver to drm bridge for iMX sharing
- use kernel fbdev emulation
panfrost:
- add support for mali MT81xx devices
- add speed binning support
lima:
- add usage stats
tegra:
- fbdev client conversion
vkms:
- Add primary plane positioning support"
* tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm: (1495 commits)
drm/i915/dp_mst: Fix active port PLL selection for secondary MST streams
drm/exynos: Implement fbdev emulation as in-kernel client
drm/exynos: Initialize fbdev DRM client
drm/exynos: Remove fb_helper from struct exynos_drm_private
drm/exynos: Remove struct exynos_drm_fbdev
drm/exynos: Remove exynos_gem from struct exynos_drm_fbdev
drm/i915: Fix memory leaks in i915 selftests
drm/i915: Make intel_get_crtc_new_encoder() less oopsy
drm/i915/gt: Avoid out-of-bounds access when loading HuC
drm/amdgpu: add some basic elements for multiple XCD case
drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)
Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"
drm/amdgpu: add common ip block for GC 9.4.3
drm/amd/display: Add logging when DP link training Clock recovery is Successful
drm/amdgpu: add common early init support for GC 9.4.3
drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3
drm/amd/display: Add logging when setting DP sink power state fails
drm/amdkfd: Add gfx_target_version for GC 9.4.3
drm/amdkfd: Enable HW_UPDATE_RPTR on GC 9.4.3
drm/amdgpu: reserve the old gc_11_0_*_mes.bin
...
Diffstat (limited to 'drivers/accel/habanalabs/common/memory.c')
-rw-r--r-- | drivers/accel/habanalabs/common/memory.c | 144 |
1 files changed, 79 insertions, 65 deletions
diff --git a/drivers/accel/habanalabs/common/memory.c b/drivers/accel/habanalabs/common/memory.c index 761a47e89b00..a7b6a273ce21 100644 --- a/drivers/accel/habanalabs/common/memory.c +++ b/drivers/accel/habanalabs/common/memory.c @@ -235,10 +235,8 @@ static int dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size, } rc = hl_pin_host_memory(hdev, addr, size, userptr); - if (rc) { - dev_err(hdev->dev, "Failed to pin host memory\n"); + if (rc) goto pin_err; - } userptr->dma_mapped = true; userptr->dir = DMA_BIDIRECTIONAL; @@ -607,6 +605,7 @@ static u64 get_va_block(struct hl_device *hdev, bool is_align_pow_2 = is_power_of_2(va_range->page_size); bool is_hint_dram_addr = hl_is_dram_va(hdev, hint_addr); bool force_hint = flags & HL_MEM_FORCE_HINT; + int rc; if (is_align_pow_2) align_mask = ~((u64)va_block_align - 1); @@ -724,9 +723,13 @@ static u64 get_va_block(struct hl_device *hdev, kfree(new_va_block); } - if (add_prev) - add_va_block_locked(hdev, &va_range->list, prev_start, - prev_end); + if (add_prev) { + rc = add_va_block_locked(hdev, &va_range->list, prev_start, prev_end); + if (rc) { + reserved_valid_start = 0; + goto out; + } + } print_va_list_locked(hdev, &va_range->list); out: @@ -1097,10 +1100,8 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device huge_page_size = hdev->asic_prop.pmmu_huge.page_size; rc = dma_map_host_va(hdev, addr, size, &userptr); - if (rc) { - dev_err(hdev->dev, "failed to get userptr from va\n"); + if (rc) return rc; - } rc = init_phys_pg_pack_from_userptr(ctx, userptr, &phys_pg_pack, false); @@ -1270,6 +1271,18 @@ init_page_pack_err: return rc; } +/* Should be called while the context's mem_hash_lock is taken */ +static struct hl_vm_hash_node *get_vm_hash_node_locked(struct hl_ctx *ctx, u64 vaddr) +{ + struct hl_vm_hash_node *hnode; + + hash_for_each_possible(ctx->mem_hash, hnode, node, vaddr) + if (vaddr == hnode->vaddr) + return hnode; + + return NULL; +} + /** * unmap_device_va() - unmap the given device virtual address. * @ctx: pointer to the context structure. @@ -1285,10 +1298,10 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, { struct hl_vm_phys_pg_pack *phys_pg_pack = NULL; u64 vaddr = args->unmap.device_virt_addr; - struct hl_vm_hash_node *hnode = NULL; struct asic_fixed_properties *prop; struct hl_device *hdev = ctx->hdev; struct hl_userptr *userptr = NULL; + struct hl_vm_hash_node *hnode; struct hl_va_range *va_range; enum vm_type *vm_type; bool is_userptr; @@ -1298,15 +1311,10 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, /* protect from double entrance */ mutex_lock(&ctx->mem_hash_lock); - hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr) - if (vaddr == hnode->vaddr) - break; - + hnode = get_vm_hash_node_locked(ctx, vaddr); if (!hnode) { mutex_unlock(&ctx->mem_hash_lock); - dev_err(hdev->dev, - "unmap failed, no mem hnode for vaddr 0x%llx\n", - vaddr); + dev_err(hdev->dev, "unmap failed, no mem hnode for vaddr 0x%llx\n", vaddr); return -EINVAL; } @@ -1779,6 +1787,44 @@ static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment, kfree(sgt); } +static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr) +{ + struct hl_device *hdev = ctx->hdev; + struct hl_vm_hash_node *hnode; + + /* get the memory handle */ + mutex_lock(&ctx->mem_hash_lock); + hnode = get_vm_hash_node_locked(ctx, addr); + if (!hnode) { + mutex_unlock(&ctx->mem_hash_lock); + dev_dbg(hdev->dev, "map address %#llx not found\n", addr); + return ERR_PTR(-EINVAL); + } + + if (upper_32_bits(hnode->handle)) { + mutex_unlock(&ctx->mem_hash_lock); + dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n", + hnode->handle, addr); + return ERR_PTR(-EINVAL); + } + + /* + * node found, increase export count so this memory cannot be unmapped + * and the hash node cannot be deleted. + */ + hnode->export_cnt++; + mutex_unlock(&ctx->mem_hash_lock); + + return hnode; +} + +static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode) +{ + mutex_lock(&ctx->mem_hash_lock); + hnode->export_cnt--; + mutex_unlock(&ctx->mem_hash_lock); +} + static void hl_release_dmabuf(struct dma_buf *dmabuf) { struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv; @@ -1789,13 +1835,15 @@ static void hl_release_dmabuf(struct dma_buf *dmabuf) ctx = hl_dmabuf->ctx; - if (hl_dmabuf->memhash_hnode) { - mutex_lock(&ctx->mem_hash_lock); - hl_dmabuf->memhash_hnode->export_cnt--; - mutex_unlock(&ctx->mem_hash_lock); - } + if (hl_dmabuf->memhash_hnode) + memhash_node_export_put(ctx, hl_dmabuf->memhash_hnode); + atomic_dec(&ctx->hdev->dmabuf_export_cnt); hl_ctx_put(ctx); + + /* Paired with get_file() in export_dmabuf() */ + fput(ctx->hpriv->filp); + kfree(hl_dmabuf); } @@ -1834,6 +1882,13 @@ static int export_dmabuf(struct hl_ctx *ctx, hl_dmabuf->ctx = ctx; hl_ctx_get(hl_dmabuf->ctx); + atomic_inc(&ctx->hdev->dmabuf_export_cnt); + + /* Get compute device file to enforce release order, such that all exported dma-buf will be + * released first and only then the compute device. + * Paired with fput() in hl_release_dmabuf(). + */ + get_file(ctx->hpriv->filp); *dmabuf_fd = fd; @@ -1933,47 +1988,6 @@ static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 s return 0; } -static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr) -{ - struct hl_device *hdev = ctx->hdev; - struct hl_vm_hash_node *hnode; - - /* get the memory handle */ - mutex_lock(&ctx->mem_hash_lock); - hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr) - if (addr == hnode->vaddr) - break; - - if (!hnode) { - mutex_unlock(&ctx->mem_hash_lock); - dev_dbg(hdev->dev, "map address %#llx not found\n", addr); - return ERR_PTR(-EINVAL); - } - - if (upper_32_bits(hnode->handle)) { - mutex_unlock(&ctx->mem_hash_lock); - dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n", - hnode->handle, addr); - return ERR_PTR(-EINVAL); - } - - /* - * node found, increase export count so this memory cannot be unmapped - * and the hash node cannot be deleted. - */ - hnode->export_cnt++; - mutex_unlock(&ctx->mem_hash_lock); - - return hnode; -} - -static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode) -{ - mutex_lock(&ctx->mem_hash_lock); - hnode->export_cnt--; - mutex_unlock(&ctx->mem_hash_lock); -} - static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev, struct hl_vm_hash_node *hnode) { @@ -2221,11 +2235,11 @@ static struct hl_mmap_mem_buf_behavior hl_ts_behavior = { * allocate_timestamps_buffers() - allocate timestamps buffers * This function will allocate ts buffer that will later on be mapped to the user * in order to be able to read the timestamp. - * in additon it'll allocate an extra buffer for registration management. + * in addition it'll allocate an extra buffer for registration management. * since we cannot fail during registration for out-of-memory situation, so * we'll prepare a pool which will be used as user interrupt nodes and instead * of dynamically allocating nodes while registration we'll pick the node from - * this pool. in addtion it'll add node to the mapping hash which will be used + * this pool. in addition it'll add node to the mapping hash which will be used * to map user ts buffer to the internal kernel ts buffer. * @hpriv: pointer to the private data of the fd * @args: ioctl input |