summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/common/firmware_if.c
AgeCommit message (Collapse)AuthorFilesLines
2021-12-26habanalabs: handle device TPM boot error as warningOfir Bitton1-0/+9
AS TPM error indication is not fatal, driver should dump a warning and continue booting. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: add new opcodes for INFO IOCTLfarah kassabri1-0/+66
Add implementation for new opcodes in the INFO IOCTL: 1. Retrieve the replaced DRAM rows from f/w. 2. Retrieve the pending DRAM rows from f/w. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: don't clear previous f/w indicationsOhad Sharabi1-14/+5
Once we read indication of whether f/w is doing the reset, we don't want to clear it, until the next time we read this indication. Otherwise, we might be in a state of wrong indication. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: use variable poll interval for fw loadingOhad Sharabi1-16/+19
Using a variable poll interval for fw loading allows us to support much slower environments (emulation) while changing only a single line in the code, instead of choosing a different interval in each function that polls. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: adding indication of boot fit loadedOhad Sharabi1-1/+3
Up until now the driver stored indication if Linux was loaded on the device CPU. This was needed in order to coordinate some tasks that are performed by the Linux. In future ASICs, many of those tasks will be performed by the boot fit, so now we need the same indication of boot fit load status. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: revise and document use of boot status flagsOhad Sharabi1-5/+19
The boot status flag "SRAM available" can be set by f/w Linux (in the general case) or by f/w uboot (in some specific debug scenario) but never by f/w preboot. Hence, when polling the boot status flags in the preboot stage we do not want to poll on "SRAM Avialable". The special case in which uboot set this flag is when we are running special debug scenario without Linux. In this case, at some point during the boot, the uboot relocates its code to the DRAM and then set the specified flag. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: modify wait for boot fit in dynamic FW loadOhad Sharabi1-1/+0
In the dynamic FW load protocol the boot status is updated to "Ready to Boot" once uboot is active. Polling on other boot status values is a residue of code duplication from the static protocol and should be removed. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18habanalabs: generalize COMMS message sending procedureAlon Mizrahi1-10/+18
Instead of having dedicated function per message that we want to send to the firmware in COMMS protocol, have a generic function that we can call to from other parts of the driver Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01habanalabs/gaudi: increase boot fit timeoutOded Gabbay1-0/+4
Various f/w versions have different timeouts, so increase the default timeout to accommodate all the options. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01habanalabs: clear msg_to_cpu_reg to avoid misread after resetKoby Elbaz1-16/+12
For some ASICs, the f/w reads the msg_to_cpu_reg value after reset, and for some it doesn't. Therefore, to be sure f/w doesn't read a wrong value after reset, we need to clear this register before the reset occurs. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01habanalabs: expose server type in INFO IOCTLOded Gabbay1-1/+1
Add the server type property to the hl_info_hw_ip_info structure that is exposed to the user via the INFO IOCTL. This is needed by the userspace s/w stack to know the connections map of the internal links that connect the ASIC among themselves inside the server. The F/W will tell us, as part of the NIC information, the server type that the GAUDI is located in. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29habanalabs: fix race between soft reset and heartbeatKoby Elbaz1-5/+13
There is a scenario where an ongoing soft reset would race with an ongoing heartbeat routine, eventually causing heartbeat to fail and thus to escalate into a hard reset. With this fix, soft-reset procedure will disable heartbeat CPU messages and flush the (ongoing) current one before continuing with reset code. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29habanalabs: update firmware header filesOfir Bitton1-2/+2
Update recent changes made in firmware header files, which contain a minor COMMS protocol change and new error status definitions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21habanalabs/gaudi: refactor hard-reset related codeKoby Elbaz1-0/+41
There is code related to hard-reset, which is done in gaudi specific code. However, this code can be used by future ASICs and therefore it is better to move it to the common code section. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: set rc as 'valid' in case of intentional func exitKoby Elbaz1-1/+4
fix the following smatch warnings: hl_fw_static_init_cpu() warn: missing error code 'rc' Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: zero complex structures using memsetKoby Elbaz1-1/+2
fix the following sparse warnings: 'warning: Using plain integer as NULL pointer' 'warning: missing braces around initializer' Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: print firmware versionsOded Gabbay1-10/+95
Firmware in habanalabs devices is composed of several components. During device initialization, we read these versions from the device. Print them during device initialization to allow better visibility in automated systems. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: enable dram scramble before linux f/wBharat Jauhari1-0/+10
In current code, for dynamic f/w loading flow, DRAM scrambling is enabled post Linux fit image is loaded to the card. This can cause the device CPU to go into reset state. The correct sequence should be: 1. Load boot fit image 2. Enable scrambling 3. Load Linux fit image This commit aligns the DRAM scrambling enabling with the static f/w load flow. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: report EQ fault during heartbeatOhad Sharabi1-1/+7
In case we have EQ fault we would like to know about it. For this, a status bitmask was added in which EQ_FAULT bit is set by FW in case of EQ fault. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: skip valid test for boot_dev_sts regsOhad Sharabi1-26/+30
Get rid of the need to check if boot_dev_sts is valid on every access to value read from these registers. This is done by storing the register value in hdev props ONLY if register is enabled. This way if register is NOT enabled all capability bits will not be set. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: split host irq interfaces towards FWOfir Bitton1-2/+24
Current implementation uses a single interrupt interface towards FW, this interface is causing races between interrupt types. We split this interface to interface per interrupt type. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: track security status using positive logicOhad Sharabi1-1/+1
Using negative logic (i.e. fw_security_disabled) is confusing. Modify the flag to use positive logic (fw_security_enabled). Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: use COMMS to reset device / halt CPUKoby Elbaz1-1/+1
This is needed because legacy FW 'communication' protocol will soon become obsolete. Because COMMS is a boot protocol, communicating through it is supported only until Linux is loaded to the device CPU, where in that case we will fallback to the former implementation. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: disable GIC usage if security is enabledKoby Elbaz1-11/+14
Security is set based on PCI ID, and after reading preboot status bits. GIC usage is set in both scenarios since GIC can't be used when security is enabled. Moreover, writing to GIC/SP is enabled only after Linux is fully loaded. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: read preboot status bits in an earlier stageKoby Elbaz1-8/+2
On newer releases, host won't be able to trigger an interrupt directly to the ASIC GIC controller. To be able to decide whether GIC can/not be used, we must read device's preboot status bits in a stage that precedes the possible first use of GIC (when device is in dirty state). Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: check running index in eqe controlOded Gabbay1-1/+8
To harden the event queue mechanism, we add a running index to the control header of the entry. The firmware writes the index in each entry and the driver verifies that the index of the current entry is larger by 1 of the index of the previous entry. In case it isn't, the driver will treat the entry as if it wasn't valid (it won't process it but won't skip it). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: read GIC sts after FW is loadedKoby Elbaz1-7/+9
Reading of GIC privileged status will be done after F/W is loaded, because privileged GIC capability is only available with the correct ARMCP version, and after it's loaded. Such versions necessarily support COMMS, so GIC alternatives (SP regs) will be read directly from dynamic regs. As well, initiation of DMA QMANs will occur after F/W is loaded since it depends on GIC configuration. In case F/W isn't loaded there's no problem since either way there won't be any GIC IRQ handling. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: send hard reset cause to prebootKoby Elbaz1-1/+126
LKD should provide hard reset cause to preboot prior to loading any FW components (in case needed). Current implementation is based on the new FW 'COMMS' protocol In cased 'COMMS' is disabled - reset cause won't be sent. Currently, only 2 reset causes are shared: HEARTBEAT & TDR. Sending the reset cause will provide the missing watchdog info that the firmware needs to provide to the BMC. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: notify before f/w loadingOded Gabbay1-0/+3
An information print notifying on starting to load the f/w was removed by mistake when moving to the new dynamic f/w loading mechanism. Restore that print as the F/W loading usually takes between 10 to 20 seconds and this print helps the user know the status of the driver load. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: use scratchpad regs instead of GIC controllerKoby Elbaz1-0/+7
Due to new security restrictions, GIC controller can no longer be accessed from user/kernel. To monitor that, a new status bit will be read from preboot caps, indicating whether direct access to GIC is blocked. In case it is blocked, driver will use scratchpad registers instead of using GIC interface on two main scenarios: The first of which LKD triggers interrupts to F/W through GIC, and the second of when LKD configures all engines/QMANs to write to GIC when they want to report an error. From F/W perspective, it will poll on all SPs, and once IRQ number is retrieved, SP register is cleared, and it will perform the write to the GIC to trigger the IRQ handler. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: read f/w's 2-nd sts and err registersOhad Sharabi1-85/+202
Maintain both STS1 and ERR1 registers used for status communication with F/W. Those are not maintained as we currently have less than 31 statuses/error defined and so LKD did not refer to those register. The reason to read them now is to try to support future f/w versions with current driver. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: avoid using uninitialized pointerOhad Sharabi1-0/+1
When attempting to read FW component's version we should break if input FW component is invalid in order to avoid using uninitialized destination pointer. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: ignore device unusable statusOded Gabbay1-3/+3
Some users might want to implement their own policy of when the device is unusable so we need to ignore this status in the driver and continue loading as normal. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: load linux image to deviceOhad Sharabi1-79/+226
Implementing dynamic linux image load to the device. This patch also implements the FW communication steps during the boot-fit. This patch also enables the dynamic protocol based on the compatibility flag. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: load boot fit to deviceOhad Sharabi1-52/+525
Implementing dynamic boot fit image load to the device. Note that some necessary adjustment were added to the static loader as well so that both loaders can co-exist. as this is not the final FW load stage the dynamic FW load is still forced to be non functional. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: modify progress status messagesGuy Nisan1-10/+10
Indicate "progress" instead of "error" when reporting progress status. Change "u-boot stopped by user" to "Cannot boot" message as CPU_BOOT_STATUS_UBOOT_NOT_READY may indicate a fatal error that prevent u-boot from loading firmware. Signed-off-by: Guy Nisan <gnisan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: update to latest f/w headersOded Gabbay1-3/+4
Update the common and GAUDI firmware header files to the latest version. The latest version use the correct endianness types so this commit also contains minor changes to the code to use the correct conversions when reading/writing to the firmware structures. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: dynamic fw load reset protocolOhad Sharabi1-14/+273
First stage of the dynamic FW load protocol is to reset the protocol to avoid residues from former load cycles. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: use common fw_version readOhad Sharabi1-3/+47
Instead of using multiple ASIC specific copies of functions to read the FW version use single common one that gets ASIC specific arguments. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: refactor init device cpu codeOhad Sharabi1-10/+20
Replace multiple arguments to init device CPU function by passing firmware loader managing structure that is initialized per ASIC with the loader parameters. In addition, the FW loader management structure is now part of the habanalabs device, this way the loader parameters will be able to be communicated across various boot stages. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: request f/w in separate functionOhad Sharabi1-21/+44
This refactor is needed due to the dynamic FW load in which requesting the FW file (and getting its attributes) is not immediately followed by copying FW file content. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: prepare preboot stage to dynamic f/w loadOhad Sharabi1-14/+64
Start the skeleton for the dynamic F/W load by marking current preboot code path as legacy. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-05-08habanalabs: ignore f/w status errorOded Gabbay1-1/+2
In case firmware has a bug and erroneously reports a status error (e.g. device unusable) during boot, allow the user to tell the driver to continue the boot regardless of the error status. This will be done via kernel parameter which exposes a mask. The user that loads the driver can decide exactly which status error to ignore and which to take into account. The bitmask is according to defines in hl_boot_if.h Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-05-08habanalabs: change error level of security not readyOded Gabbay1-5/+2
This error indicates a problem in the security initialization inside the f/w so we need to stop the device loading because it won't be usable. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-05-08habanalabs: skip reading f/w errors on bad statusOded Gabbay1-2/+7
If we read all FF from the boot status register, then something is totally wrong and there is no point of reading specific errors. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-05-08habanalabs: expose ASIC specific PLL indexBharat Jauhari1-14/+20
Currently the user cannot interpret the PLL information based on index as its exposed as an integer. This commit exposes ASIC specific PLL indexes and maps it to a generic FW compatible index. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: print f/w boot unknown errorOded Gabbay1-16/+68
We need to print a message to the kernel log in case we encounter an unknown error in the f/w boot to help the user understand what happened. In addition, we shouldn't print unknown error in case of known errors. Moreover, in case of warnings/info, we shouldn't return -EIO that will fail the initialization and mark the device as disabled Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: derive security status from pci idOfir Bitton1-3/+3
As F/ security indication must be available before driver approaches PCI bus, F/W security should be derived from PCI id rather than be fetched during boot handshake with F/W. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: send dynamic msi-x indexes to f/wOhad Sharabi1-0/+67
In order to minimize hard coded values between F/W and the driver, we send msi-x indexes dynamically to the F/W. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: support DEVICE_UNUSABLE error indication from FWKoby Elbaz1-0/+3
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit. On boot-up, the driver will therefore fail inserting the device. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>