Age | Commit message (Collapse) | Author | Files | Lines |
|
AS TPM error indication is not fatal, driver should dump a warning
and continue booting.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add implementation for new opcodes in the INFO IOCTL:
1. Retrieve the replaced DRAM rows from f/w.
2. Retrieve the pending DRAM rows from f/w.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Once we read indication of whether f/w is doing the reset, we don't
want to clear it, until the next time we read this indication.
Otherwise, we might be in a state of wrong indication.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Using a variable poll interval for fw loading allows us to support
much slower environments (emulation) while changing only a single
line in the code, instead of choosing a different interval in each
function that polls.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Up until now the driver stored indication if Linux was loaded on the
device CPU. This was needed in order to coordinate some tasks that are
performed by the Linux.
In future ASICs, many of those tasks will be performed by the boot
fit, so now we need the same indication of boot fit load status.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The boot status flag "SRAM available" can be set by f/w Linux (in the
general case) or by f/w uboot (in some specific debug scenario) but
never by f/w preboot.
Hence, when polling the boot status flags in the preboot stage we do not
want to poll on "SRAM Avialable".
The special case in which uboot set this flag is when we are running
special debug scenario without Linux. In this case, at some point during
the boot, the uboot relocates its code to the DRAM and then set the
specified flag.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In the dynamic FW load protocol the boot status is updated to
"Ready to Boot" once uboot is active.
Polling on other boot status values is a residue of code duplication
from the static protocol and should be removed.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Instead of having dedicated function per message that we want to send
to the firmware in COMMS protocol, have a generic function that we can
call to from other parts of the driver
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Various f/w versions have different timeouts, so increase the default
timeout to accommodate all the options.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For some ASICs, the f/w reads the msg_to_cpu_reg value after
reset, and for some it doesn't.
Therefore, to be sure f/w doesn't read a wrong value after reset, we
need to clear this register before the reset occurs.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add the server type property to the hl_info_hw_ip_info structure
that is exposed to the user via the INFO IOCTL.
This is needed by the userspace s/w stack to know the connections map
of the internal links that connect the ASIC among themselves inside the
server.
The F/W will tell us, as part of the NIC information, the server type
that the GAUDI is located in.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is a scenario where an ongoing soft reset would race with an
ongoing heartbeat routine, eventually causing heartbeat to fail and
thus to escalate into a hard reset.
With this fix, soft-reset procedure will disable heartbeat CPU messages
and flush the (ongoing) current one before continuing with reset code.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Update recent changes made in firmware header files, which contain
a minor COMMS protocol change and new error status definitions.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is code related to hard-reset, which is done in gaudi specific
code. However, this code can be used by future ASICs and therefore it
is better to move it to the common code section.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
fix the following smatch warnings:
hl_fw_static_init_cpu() warn: missing error code 'rc'
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
fix the following sparse warnings:
'warning: Using plain integer as NULL pointer'
'warning: missing braces around initializer'
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Firmware in habanalabs devices is composed of several components.
During device initialization, we read these versions from the device.
Print them during device initialization to allow better visibility in
automated systems.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In current code, for dynamic f/w loading flow, DRAM scrambling is
enabled post Linux fit image is loaded to the card. This can cause the
device CPU to go into reset state.
The correct sequence should be:
1. Load boot fit image
2. Enable scrambling
3. Load Linux fit image
This commit aligns the DRAM scrambling enabling with the static f/w load
flow.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case we have EQ fault we would like to know about it.
For this, a status bitmask was added in which EQ_FAULT bit is
set by FW in case of EQ fault.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Get rid of the need to check if boot_dev_sts is valid on every access
to value read from these registers.
This is done by storing the register value in hdev props ONLY if
register is enabled.
This way if register is NOT enabled all capability bits will not be set.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Current implementation uses a single interrupt interface towards
FW, this interface is causing races between interrupt types.
We split this interface to interface per interrupt type.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Using negative logic (i.e. fw_security_disabled) is confusing.
Modify the flag to use positive logic (fw_security_enabled).
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This is needed because legacy FW 'communication' protocol will soon
become obsolete.
Because COMMS is a boot protocol, communicating through it is supported
only until Linux is loaded to the device CPU, where in that case we
will fallback to the former implementation.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Security is set based on PCI ID, and after reading preboot status bits.
GIC usage is set in both scenarios since GIC can't be used when security
is enabled.
Moreover, writing to GIC/SP is enabled only after Linux is fully loaded.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
On newer releases, host won't be able to trigger an interrupt directly
to the ASIC GIC controller.
To be able to decide whether GIC can/not be used, we must read device's
preboot status bits in a stage that precedes the possible first use of
GIC (when device is in dirty state).
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
To harden the event queue mechanism, we add a running index to the
control header of the entry.
The firmware writes the index in each entry and the driver verifies
that the index of the current entry is larger by 1 of the index of
the previous entry.
In case it isn't, the driver will treat the entry as if it wasn't
valid (it won't process it but won't skip it).
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Reading of GIC privileged status will be done after F/W is loaded,
because privileged GIC capability is only available with the correct
ARMCP version, and after it's loaded.
Such versions necessarily support COMMS, so GIC alternatives (SP regs)
will be read directly from dynamic regs.
As well, initiation of DMA QMANs will occur after F/W is loaded
since it depends on GIC configuration.
In case F/W isn't loaded there's no problem since either way
there won't be any GIC IRQ handling.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
LKD should provide hard reset cause to preboot prior to
loading any FW components (in case needed).
Current implementation is based on the new FW 'COMMS' protocol
In cased 'COMMS' is disabled - reset cause won't be sent.
Currently, only 2 reset causes are shared: HEARTBEAT & TDR.
Sending the reset cause will provide the missing watchdog
info that the firmware needs to provide to the BMC.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
An information print notifying on starting to load the f/w was removed
by mistake when moving to the new dynamic f/w loading mechanism.
Restore that print as the F/W loading usually takes between 10 to 20
seconds and this print helps the user know the status of the driver
load.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Due to new security restrictions, GIC controller can no
longer be accessed from user/kernel.
To monitor that, a new status bit will be read from preboot
caps, indicating whether direct access to GIC is blocked.
In case it is blocked, driver will use scratchpad registers
instead of using GIC interface on two main scenarios:
The first of which LKD triggers interrupts to F/W through GIC,
and the second of when LKD configures all engines/QMANs
to write to GIC when they want to report an error.
From F/W perspective, it will poll on all SPs, and once IRQ
number is retrieved, SP register is cleared, and it will perform the
write to the GIC to trigger the IRQ handler.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Maintain both STS1 and ERR1 registers used for status communication
with F/W.
Those are not maintained as we currently have less than 31
statuses/error defined and so LKD did not refer to those register.
The reason to read them now is to try to support future f/w versions
with current driver.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
When attempting to read FW component's version we should break if input
FW component is invalid in order to avoid using uninitialized
destination pointer.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Some users might want to implement their own policy of when the device
is unusable so we need to ignore this status in the driver and continue
loading as normal.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Implementing dynamic linux image load to the device.
This patch also implements the FW communication steps during the
boot-fit.
This patch also enables the dynamic protocol based on the compatibility
flag.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Implementing dynamic boot fit image load to the device.
Note that some necessary adjustment were added to the static loader as
well so that both loaders can co-exist.
as this is not the final FW load stage the dynamic FW load is still
forced to be non functional.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Indicate "progress" instead of "error" when reporting progress status.
Change "u-boot stopped by user" to "Cannot boot" message as
CPU_BOOT_STATUS_UBOOT_NOT_READY may indicate a fatal error that prevent
u-boot from loading firmware.
Signed-off-by: Guy Nisan <gnisan@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Update the common and GAUDI firmware header files to the latest version.
The latest version use the correct endianness types so this commit also
contains minor changes to the code to use the correct conversions when
reading/writing to the firmware structures.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
First stage of the dynamic FW load protocol is to reset the protocol to
avoid residues from former load cycles.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Instead of using multiple ASIC specific copies of functions to read the
FW version use single common one that gets ASIC specific arguments.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Replace multiple arguments to init device CPU function by passing
firmware loader managing structure that is initialized per ASIC with
the loader parameters.
In addition, the FW loader management structure is now part of the
habanalabs device, this way the loader parameters will be able to be
communicated across various boot stages.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This refactor is needed due to the dynamic FW load in which requesting
the FW file (and getting its attributes) is not immediately followed by
copying FW file content.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Start the skeleton for the dynamic F/W load by marking current preboot
code path as legacy.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case firmware has a bug and erroneously reports a status error
(e.g. device unusable) during boot, allow the user to tell the driver
to continue the boot regardless of the error status.
This will be done via kernel parameter which exposes a mask. The
user that loads the driver can decide exactly which status error to
ignore and which to take into account. The bitmask is according to
defines in hl_boot_if.h
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This error indicates a problem in the security initialization inside
the f/w so we need to stop the device loading because it won't be
usable.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
If we read all FF from the boot status register, then something is
totally wrong and there is no point of reading specific errors.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Currently the user cannot interpret the PLL information based on index
as its exposed as an integer.
This commit exposes ASIC specific PLL indexes and maps it to a generic
FW compatible index.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We need to print a message to the kernel log in case we encounter
an unknown error in the f/w boot to help the user understand what
happened.
In addition, we shouldn't print unknown error in case of known errors.
Moreover, in case of warnings/info, we shouldn't return -EIO that will
fail the initialization and mark the device as disabled
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As F/ security indication must be available before driver approaches
PCI bus, F/W security should be derived from PCI id rather than be
fetched during boot handshake with F/W.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit.
On boot-up, the driver will therefore fail inserting the device.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|