summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/common
AgeCommit message (Collapse)AuthorFilesLines
2022-02-28habanalabs: avoid using an uninitialized variableTomer Tayar1-1/+1
Fix the following compilation warning in hl_cb_ioctl() @ command_buffer.c: warning: ‘device_va’ may be used uninitialized in this function Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: set max power on device init per ASICTomer Tayar2-1/+4
For current devices there is a need to send the max power value to F/W during device init, for example because there might be several card types. In future devices, this info will be programmed in the device's EEPROM and will be read by F/W, and hence the driver should not send it. Modify the sending of the relevant message to be done only for ASIC types that need it. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: use proper max_power variable for device utilizationTomer Tayar1-1/+1
The max_power variable which is used for calculating the device utilization is the ASIC specific property which is set during init. However, the max value can be modified via sysfs, and thus the updated value in the device structure should be used instead. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: enable stop-on-error debugfs setting per ASICTomer Tayar2-0/+8
On Goya and Gaudi, the stop-on-error configuration can be set via debugfs. However, in future devices, this configuration will always be enabled. Modify the debugfs node to be allowed only for ASICs that support this dynamic configuration. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: change function to staticOded Gabbay1-1/+1
handle_registration_node() is called directly from the irq handler in irq.c, so it can be static. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: add missing include of vmalloc.hOded Gabbay1-0/+1
Use of vfree(), vmalloc_user(), vmalloc() and remap_vmalloc_range() requires this include in some architectures. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix use-after-free bugOded Gabbay1-2/+2
When the code iterates over the free list of physical pages nodes, it deletes the physical page node which is used as the iterator. Therefore, we need to use the safe version of the iteration to prevent use-after-free. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: rephrase error messages in PCI initializationOded Gabbay1-2/+2
The iATU is an internal h/w machine inside Habana's PCI controller. Mentioning it by name doesn't say anything to the user. It is better to say the PCI controller initialization was not done successfully. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix spelling mistakeOded Gabbay1-1/+1
The name of the property is hints_range_reservation Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: Timestamps buffers registrationfarah kassabri6-56/+655
Timestamp registration API allows the user to register a timestamp record event which will make the driver set timestamp when CQ counter reaches the target value and write it to a specific location specified by the user. This is a non blocking API, unlike the wait_for_interrupt which is a blocking one. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix race when waiting on encaps signalDani Liberman1-5/+8
Scenario: 1. CS which is part of encaps signal has been completed and now executing kref_put to its encaps signal handle. The refcount of the handle decremented to 0, and called the encaps signal handle release function - hl_encaps_handle_do_release. 2. At this point the user starts waiting on the signal, and finds the encaps signal handle in the handlers list and increment the habdle refcount to 1. 3. Immediately after, hl_encaps_handle_do_release removed the handle from the list and free its memory. 4. Wait function using the handle although it has been freed. This scenario caused the slab area which was previously allocated for the handle to be poison overwritten which triggered kernel bug the next time the OS needed to allocate this slab. Fixed by getting the refcount of the handle only in case it is not zero. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: silence an uninitialized variable warningDan Carpenter1-0/+2
Smatch warns that: drivers/misc/habanalabs/common/command_buffer.c:471 hl_cb_ioctl() error: uninitialized symbol 'device_va'. Which is true, but harmless. Anyway, it's easy to silence this by adding a error check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: remove duplicate printOded Gabbay1-6/+1
We print detailed messages inside the internal ioctl functions. No need to print a generic message at the end, it doesn't add any information. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: prevent false heartbeat failure during soft-resetTomer Tayar1-2/+5
The heartbeat thread is active during soft-reset, and it tries to send messages to CPU-CP core. Within the soft-reset, in the time window in which the device is marked as disabled, any CPU-CP command is "silently" skipped and a success value it returned. However, in addition to the return value, the heartbeat function also checks the F/W result, but because no command is sent in this time window, the result variable won't hold the expected value and we will have a false heartbeat failure. To avoid it, modify the "silent" skip to be done only in hard-reset. The CPU-CP should be able to handle messages during soft-reset. In addition to the heartbeat problem, this should also solve other issues in other flows that send messages during soft-reset and use the F/W result as it w/o being aware to the reset. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix race between wait and irqOded Gabbay1-1/+5
There is a race in the user interrupts code, where between checking the target value and adding the new pend to the list, there is a chance the interrupt happened. In that case, no one will complete the node, and we will get a timeout on it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix user interrupt wait when timeout is 0Oded Gabbay1-4/+6
When timeout is 0, we need to return the busy status in case the target value wasn't reached upon entry to the ioctl. Also return the correct timestamp. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: reject host map with mmu disabledOded Gabbay1-19/+11
This is not something we can do a workaround. It is clearly an error and we should notify the user that it is an error. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: expose number of user interruptsOded Gabbay1-2/+2
Currently we only expose to the user the ID of the first available user interrupt. To make user interrupts allocation truly dynamic, we need to also expose the number of user interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: add missing error check in sysfs max_power_showTomer Tayar3-3/+5
Add a missing error check in the sysfs show function for max_power. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix soft reset flow in case of failureDani Liberman1-0/+3
In case of soft reset failure, hard reset should be initiated, but reset flags were not set to enable it, which caused another soft reset followed by another failure. Updated reset flags to enable hard reset flow in case of soft reset failure. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: add missing error check in sysfs clk_freq_mhz_showTomer Tayar1-0/+4
Add a missing error check in the sysfs show functions for clk_max_freq_mhz and clk_cur_freq_mhz_show. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: avoid copying pll data if pll_info_get failsTomer Tayar1-2/+4
If reading PLL info from F/W fails, the PLL info is not set in the "result" variable, and hence shouldn't be copied to the caller's array. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: don't free phys_pg_pack inside lockOded Gabbay2-5/+14
Freeing phys_pg_pack includes calling to scrubbing functions of the device's memory, taking locks and possibly even calling reset. This is not something that should be done while holding a device-wide spinlock. Therefore, save the relevant objects on a local linked-list and after releasing the spinlock, traverse that list and free the phys_pg_pack objects. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: there is no kernel TDR in future ASICsOded Gabbay1-4/+13
In future ASICs, there is no kernel TDR for new workloads that are submitted directly from user-space to the device. Therefore, the driver can NEVER know that a workload has timed-out. So, when the user asks us to wait for interrupt on the workload's completion, and the wait has timed-out, it doesn't mean the workload has timed-out. It only means the wait has timed-out, which is NOT an error from driver's perspective. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: sysfs support for fw os versionRajaravi Krishna Katta1-0/+10
Adds new sysfs entry to display firmware os version /sys/class/habanalabs/hl<n>/fw_os_ver Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: remove power9 workaround for dma supportOded Gabbay2-7/+1
We don't need this workaround anymore. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: add vrm version to sysfsOded Gabbay2-19/+31
infineon version is only applicable to GOYA and GAUDI. For later ASICs, we display the Voltage Regulator Monitor f/w version. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: rename dev_attr_grp to dev_clk_attr_grpOded Gabbay2-5/+4
In this attribute group we are only adding clocks. This is in preparation for adding a device specific attribute group which is not related to clocks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: remove asic callback set_pll_profile()Oded Gabbay2-5/+2
Setting PLL profile is the same for all ASICs, except for GOYA. However, because this function is never called from common code, there is no need to have an asic-specific callback function. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: move more f/w functions to firmware_if.cOded Gabbay4-139/+122
For better maintainability, try to concentrate all the common functions that communicate with the f/w in firmware_if.c Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: remove hwmgr.cOded Gabbay5-53/+45
The two remaining functions in this file belong to firmware_if.c, as they communicate with the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: get clk is common functionOded Gabbay3-15/+14
Retrieving the clock from the f/w is done exactly the same in ALL our ASICs. Therefore, no real justification for doing it as an ASIC-specific function. The only thing is we need to check if we are running on simulator, which doesn't require ASIC-specific callback. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: sysfs functions should be in sysfs.cOded Gabbay3-75/+70
Move common sysfs store/show functions to sysfs.c file for consistency. This is part of a patch-set to remove hwmgr.c Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: make some MMU functions commonOhad Sharabi3-37/+72
Some MMU functions can be used by different versions of our MMUs, so move them to be common. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: remove ASIC functions of clock gatingOded Gabbay2-9/+0
Now that clock gating is permanently disabled in GAUDI, no need for the ASIC functions of setting and disabling clock gating, as this was a unique scenario in GAUDI. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs/gaudi: disable CGM permanentlyOded Gabbay3-36/+1
Due to the need of SynapseAI to configure all TPC engines from a single QMAN, the driver must disable CGM and never allow the user to enable it. Otherwise, the configuration of the TPC engines will fail. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: fix possible memory leak in MMU DR finiOhad Sharabi1-1/+1
This patch fixes what seems to be copy paste error. We will have a memory leak if the host-resident shadow is NULL (which will likely happen as the DR and HR are not dependent). Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: check the return value of hl_cs_poll_fences()Tomer Tayar1-1/+1
As part of handling of the multi-CS wait ioctl, hl_cs_poll_fences() is called in a "while (true)" loop. This function can fail, but the checking of its return value was missed. Add this check and exit the loop in case of a failure. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: support hard-reset scheduling during soft-resetOfir Bitton2-3/+31
As hard-reset can be requested during soft-reset, driver must allow it or else critical events received during soft-reset will be ignored. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: add a lock to protect multiple reset variablesOfir Bitton4-26/+49
Atomic operations during reset are replaced by a spinlock in order to have the ability to protect more than a single variable. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: refactor reset information variablesOfir Bitton11-97/+110
Unify variables related to device reset, which will help us to add some new reset functionality in future patches. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: handle skip multi-CS if handling not doneOhad Sharabi1-1/+11
This patch fixes issue in which we have timeout for multi-CS although the CS in the list actually completed. Example scenario (the two threads marked as WAIT for the thread that handles the wait_for_multi_cs and CMPL as the thread that signal completion for both CS and multi-CS): 1. Submit CS with sequence X 2. [WAIT]: call wait_for_multi_cs with single CS X 3. [CMPL]: CS X do invoke complete_all for both CS and multi-CS (multi_cs_completion_done still false) 4. [WAIT]: enter poll_fences, reinit the completion and find the CS as completed when asking on the fence but multi_cs_done is still false it returns that no CS actually completed 5. [CMPL]: set multi_cs_handling_done as true 6. [WAIT]: wait for completion but no CS to awake the wait context and hence wait till timeout Solution: if CS detected as completed in poll_fences but multi_cs_done is still false invoke complete_all to the multi-CS completion and so it will not go to sleep in wait_for_completion but rather will have a "second chance" to wait for multi_cs_completion_done. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: add CPU-CP packet for engine core ASID cfgTomer Tayar2-0/+21
In some cases the driver cannot configure ASID of some engines due to the security level of the relevant registers. For this a new CPU-CP packet is introduced, which will allow the driver to ask the F/W to do this configuration instead. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: replace some -ENOTTY with -EINVALOded Gabbay3-5/+5
-ENOTTY is returned in case of error in the ioctl arguments themselves, such as function that doesn't exists. In all other cases, where the error is in the arguments of the custom data structures that we define that are passed in the various ioctls, we need to return -EINVAL. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: fix comments according to kernel-docOfir Bitton1-7/+17
Fix missing fields, descriptions not according to kernel-doc style. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: fix endianness when reading cpld versionOfir Bitton1-1/+1
Current sysfs implementation does not take endianness into consideration when dumping the cpld version. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: change wait_for_interrupt implementationfarah kassabri4-10/+145
Currently the cq counters are allocated in userspace memory, and mapped by the driver to the device address space. A new requirement that is part of new future API related to this one, requires that cq counters will be allocated in kernel memory. We leverage the existing cb_create API with KERNEL_MAPPED flag set to allocate this memory. That way we gain two things: 1. The memory cannot be freed while in use since it's protected by refcount in driver. 2. No need to wake up the user thread upon each interrupt from CQ, because the kernel has direct access to the counter. Therefore, it can make comparison with the target value in the interrupt handler and wake up the user thread only if the counter reaches the target value. This is instead of waking the thread up to copy counter value from user then go sleep again if target value wasn't reached. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: prevent wait if CS in multi-CS list completedOhad Sharabi2-34/+54
By the original design we assumed that if we "miss" multi CS completion it is of no severe consequence as we'll just call wait_for_multi_cs again. Sequence of events for such scenario: 1. user submit CS with sequence N 2. user calls wait for multi-CS with only CS #N in the list 3. the multi CS call starts with poll of the CSs but find that none completed (while CS #N did not completed yet) 4. now, multi CS #N complete but multi CS CTX was not yet created for the above multi-CS. so, attempt to complete multi-CS fails (as no multi CS CTX exist) 5. wait_for_multi_cs call now does init_wait_multi_cs_completion (and for this create the multi-CS CTX) 6. wait_for_multi_cs wits on completion but will not get one as CS #N already completed To fix the issue we initialize the multi-CS CTX prior polling the fences. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: modify cpu boot status error printOfir Bitton1-1/+1
As BTL can be replaced by ROM we should modify relevant error print. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: clean MMU headers definitionsOhad Sharabi1-4/+4
During the MMU development the MMU header files were left with unclean definitions: - MMU "version specific" definitions that were left in the mmu_general file - unused definitions This patch attempts, where possible, to keep definitions that can serve multiple MMU versions (but that are not tightly bound with specific MMU arch) in the mmu_general header file (e.g. different definitions for number of HOPs). Otherwise, move MMU version specific definitions (e.g. HOPs masks and shifts) to the specific MMU version file. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>