starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-29	habanalabs: get multiple fences under same cs_lock	Ohad Sharabi	3	-51/+113
	To add proper support for wait-for-multi-CS, locking the CS lock for each CS fence in the list is not efficient. Instead, this patch add support to lock the CS lock once to get all required fences. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: revise prints on FD close	Oded Gabbay	2	-3/+3
	The driver quietly handles memory mappings that were not freed so no need to print a warning about that when user closes the FD. Accordingly, revise the text that is printed in case the device is still in use after the user process closed the FD. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: fix race between soft reset and heartbeat	Koby Elbaz	4	-80/+67
	There is a scenario where an ongoing soft reset would race with an ongoing heartbeat routine, eventually causing heartbeat to fail and thus to escalate into a hard reset. With this fix, soft-reset procedure will disable heartbeat CPU messages and flush the (ongoing) current one before continuing with reset code. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: set dma max segment size	Oded Gabbay	1	-0/+2
	This is required from any device that is capable to perform DMA. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: add asic property of host dma offset	Oded Gabbay	1	-0/+3
	Each ASIC can have a different offset to add to a host dma address, to enable the ASIC to access that host memory. The usage for this can be common code so add this to the asic property structure. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: fix type of variable	Oded Gabbay	1	-2/+1
	Recently, the size parameter in userptr structure was change to u64. As a result, we need to change the type of the local range_size in device_va_to_pa() to u64 to avoid overflow. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: mark linux image as not loaded after hw_fini	Tomer Tayar	1	-0/+5
	If hard reset fails after the call to hw_fini and before loading the linux image to the device, a subsequent call to hw_fini should communicate via COMMS (or MSG_TO_CPU regs for old FW versions). However, the driver still tries in this case to communicate via the GIC, and thus no hard reset is actually done. To avoid that, the patch clears the linux_loaded flag after every call to hw_fini. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: fix nullifying of destroyed mmu pgt pool	Tomer Tayar	1	-6/+6
	In case of host-resident MMU, when the page tables pool is destroyed, its pointer is not nullified correctly. As a result, on a device fini which happens after a failing reset, the already destroyed pool is accessed, which leads to a kernel panic. The patch fixes the setting of the pool pointer to NULL. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: rename cb_mmap to mmap	Zvika Yehudai	2	-3/+3
	This function will be used for more mmap operations than just mmaping CBs. Signed-off-by: Zvika Yehudai <zyehudai@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: missing mutex_unlock in process kill procedure	Ofir Bitton	1	-0/+1
	missing mutex unlock once driver is giving up killing user processes. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: state dump monitors and fences infrastructure	Yuri Nudelman	2	-23/+322
	With the infrastructure in place, monitors and fences dump shall be implemented. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: expose state dump	Yuri Nudelman	6	-1/+685
	To improve the user's ability to debug the case where a workload that is part of executing training/inference of a topology is getting stuck, we need to add a 'core dump' each time a CS times-out. The 'core dump' shall contain all relevant Sync Manager information and corresponding fence values. The most recent dumps shall be accessible via debugfs, under 'state_dump' node. Reading from the node will provide the oldest dump available. Writing an integer value X will discard X dumps, starting with the oldest one, i.e. subsequent read will now return newer dumps. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: use get_task_pid() to take PID	Oded Gabbay	2	-2/+6
	The previous function we used, find_get_pid(), wasn't good in case the user process was run inside docker. As a result, we didn't had the PID and we couldn't kill the user process in case the device got stuck and we needed to reset the device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: allow disabling huge page use	Oded Gabbay	1	-13/+23
	Sometimes we may need to disable optimization of using huge pages in our memory management code. Add such a flag to the function that creates the list of physical pages that would be programmed into the device MMU. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: user mappings can be 64-bit	Oded Gabbay	2	-3/+3
	Increase the size variable in the userptr structure to 64-bit. That variable describes the size of the memory allocation of the user that is now being mapped into the device. The mapping can be larger than 4GB, so we need to support it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: handle case of interruptable wait	Oded Gabbay	1	-2/+9
	Same as we handle it in the regular wait for CS, we need to handle the case where the waiting for user interrupt was interrupted. In that case, we need to return correct error code to the user. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: release pending user interrupts on device fini	Oded Gabbay	1	-54/+46
	In device fini there was missing a call to release all pending user interrupts. That can cause a process to be stuck inside the driver's IOCTL of wait for interrupts, in case the device is removed or simulator is killed at the same time. In addition, also call to remove inactive codec job was missing. Moreover, to prevent such errors in the future (where code is added to reset path but not to device fini), we moved some common parts to two dedicated functions: cleanup_resources take_release_locks Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: re-init completion object upon retry	Oded Gabbay	1	-1/+12
	In case user interrupt arrived but the completion value is less than the target value, we want to retry the wait. However, before the retry we must reinitialize the completion object, under spin-lock, so the wait function won't exit immediately because the completion object is already completed (from the previous interrupt). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: rename enum vm_type_t to vm_type	Oded Gabbay	3	-14/+12
	We don't use typedefs so the enum name shouldn't end with _t Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: update firmware header files	Ofir Bitton	1	-2/+2
	Update recent changes made in firmware header files, which contain a minor COMMS protocol change and new error status definitions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: allow fail on inability to respect hint	Yuri Nudelman	1	-3/+42
	A new user flag is required to make memory map hint mandatory, in contrast to the current situation where it is best effort. This is due to the requirement to map certain data to specific pre-determined device virtual address ranges. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29	habanalabs: support hint addresses range reservation	farah kassabri	2	-8/+76
	Add support for pre-determined driver-reserved device VA address ranges. This is needed for future ASIC support where some contents must be mapped into these pre-determined ranges because the H/W will be configured using these ranges. In case the user asks to map a VA without a hint address, avoid allocating the device VA from the reserved ranges. Make sure the validation checks of the hint address take into account situation where the DRAM page size is not pow of 2. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21	habanalabs/gaudi: refactor hard-reset related code	Koby Elbaz	2	-0/+50
	There is code related to hard-reset, which is done in gaudi specific code. However, this code can be used by future ASICs and therefore it is better to move it to the common code section. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21	habanalabs: add validity check for signal cs	farah kassabri	3	-24/+75
	In preparation for a new feature that allows the user to reserve signals ahead of submissions, we need to change a current assumption in the code. Currently, the driver uses 2 SOBs to support signal CS. When the first SOB reaches max value, the driver switches to the other one and assumes that when it will need to switch back to the first one, all of the signals have already been handled. This assumption won't hold when the new feature will be added, because using signal reservation, the driver can reach the max SOB value very fast. The change is to add a validity check when submitting a signal CS, to make sure the previous SOB is available (all the signals attached to it indeed finished). Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21	habanalabs: allow reset upon device release	Ofir Bitton	3	-5/+29
	We introduce a new type of reset which is reset upon device release. This reset is very similar to soft reset except the fact it is performed only upon device release and not upon user sysfs request nor TDR. The purpose of this reset is to make sure the device is returned to IDLE state after the current user has finished working with the device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	debugfs: add skip_reset_on_timeout option	Yuri Nudelman	3	-0/+9
	To be able to debug long-running CS better, without changing the userspace code, we are adding a new option through debugfs interface to skip the reset of the device in case of CS timeout. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: fix typo	Zvika Yehudai	1	-1/+1
	fix a spelling error in comment Signed-off-by: Zvika Yehudai <zyehudai@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: remove a rogue #ifdef	Oded Gabbay	1	-5/+0
	There was a rogue #ifdef that crept into the upstream code for backwards compatibility which isn't needed of course. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: added open_stats info ioctl	Yuri Nudelman	4	-0/+35
	In a system with multiple ASICs, there is a need to provide monitoring tools with information on how long a device was opened and how many times a device was opened. Therefore, we add a new opcode to the INFO ioctl to provide that information. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: set rc as 'valid' in case of intentional func exit	Koby Elbaz	2	-3/+7
	fix the following smatch warnings: hl_fw_static_init_cpu() warn: missing error code 'rc' Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: zero complex structures using memset	Koby Elbaz	1	-1/+2
	fix the following sparse warnings: 'warning: Using plain integer as NULL pointer' 'warning: missing braces around initializer' Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: print more info when failing to pin user memory	Tomer Tayar	1	-2/+2
	pin_user_pages_fast() might fail and return a negative number, or pin less pages than requested and return the number of the pages that were pinned. For the latter, it is informative to print also the memory size and the number of requested pages. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: Fix an error handling path in 'hl_pci_probe()'	Christophe JAILLET	1	-0/+1
	If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function. Fixes: 2e5eda4681f9 ("habanalabs: PCIe Advanced Error Reporting support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: print firmware versions	Oded Gabbay	1	-10/+95
	Firmware in habanalabs devices is composed of several components. During device initialization, we read these versions from the device. Print them during device initialization to allow better visibility in automated systems. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: add hard reset timeout for PLDM	Omer Shpigelman	2	-2/+8
	Hard reset flow on PLDM might take more than 2 minutes. Hence add a dedicated hard reset timeout of 6 minutes for PLDM. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: enable dram scramble before linux f/w	Bharat Jauhari	2	-1/+13
	In current code, for dynamic f/w loading flow, DRAM scrambling is enabled post Linux fit image is loaded to the card. This can cause the device CPU to go into reset state. The correct sequence should be: 1. Load boot fit image 2. Enable scrambling 3. Load Linux fit image This commit aligns the DRAM scrambling enabling with the static f/w load flow. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: enable stop on error for all QMANs and engines	Ofir Bitton	1	-0/+1
	If there is an error in the QMAN/engine, there is no point of trying to continue running the workload. It is better to stop to allow the user to debug the program. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: report EQ fault during heartbeat	Ohad Sharabi	1	-1/+7
	In case we have EQ fault we would like to know about it. For this, a status bitmask was added in which EQ_FAULT bit is set by FW in case of EQ fault. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: small code refactoring	Koby Elbaz	1	-1/+1
	Use datatype defines instead of hard coded values, and rename set_fixed_properties function. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: fix mask to obtain page offset	Ohad Sharabi	1	-3/+11
	When converting virtual address to physical we need to add correct offset to the physical page. For this we need to use mask that include ALL bits of page offset. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: skip valid test for boot_dev_sts regs	Ohad Sharabi	1	-26/+30
	Get rid of the need to check if boot_dev_sts is valid on every access to value read from these registers. This is done by storing the register value in hdev props ONLY if register is enabled. This way if register is NOT enabled all capability bits will not be set. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: reset device upon FD close if not idle	Ofir Bitton	4	-12/+19
	If device is not idle after user closes the FD we must reset device as next user that will try to open FD will encounter a non-functional device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: add debug flag to prevent failure on timeout	Yuri Nudelman	2	-5/+25
	Sometimes it is useful to allow the command to continue running despite the timeout occurred, to differentiate between really stuck or just very time consuming commands. This can be achieved by passing a new debug flag alongside the cs, HL_CS_FLAGS_SKIP_RESET_ON_TIMEOUT. Anyway, if the timeout occurred, a warning print shall be issued, however this shall not fail the submission. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs/gaudi: split host irq interfaces towards FW	Ofir Bitton	1	-2/+24
	Current implementation uses a single interrupt interface towards FW, this interface is causing races between interrupt types. We split this interface to interface per interrupt type. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: prefer ASYNC device probing	Oded Gabbay	1	-1/+5
	There is no dependency when probing multiple devices so indicate to the kernel that it can probe our devices in ASYNC fashion. This shortens insmod of the driver from ~2 minutes to 20 seconds on a system with 8 devices. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: track security status using positive logic	Ohad Sharabi	3	-7/+7
	Using negative logic (i.e. fw_security_disabled) is confusing. Modify the flag to use positive logic (fw_security_enabled). Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs/gaudi: use COMMS to reset device / halt CPU	Koby Elbaz	2	-2/+5
	This is needed because legacy FW 'communication' protocol will soon become obsolete. Because COMMS is a boot protocol, communicating through it is supported only until Linux is loaded to the device CPU, where in that case we will fallback to the former implementation. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs/gaudi: disable GIC usage if security is enabled	Koby Elbaz	2	-11/+16
	Security is set based on PCI ID, and after reading preboot status bits. GIC usage is set in both scenarios since GIC can't be used when security is enabled. Moreover, writing to GIC/SP is enabled only after Linux is fully loaded. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: read preboot status bits in an earlier stage	Koby Elbaz	1	-8/+2
	On newer releases, host won't be able to trigger an interrupt directly to the ASIC GIC controller. To be able to decide whether GIC can/not be used, we must read device's preboot status bits in a stage that precedes the possible first use of GIC (when device is in dirty state). Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18	habanalabs: check running index in eqe control	Oded Gabbay	3	-4/+36
	To harden the event queue mechanism, we add a running index to the control header of the entry. The firmware writes the index in each entry and the driver verifies that the index of the current entry is larger by 1 of the index of the previous entry. In case it isn't, the driver will treat the entry as if it wasn't valid (it won't process it but won't skip it). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>