Age | Commit message (Collapse) | Author | Files | Lines |
|
To be forward-backward compatible with the firmware in the initial
communication during preboot, we need to remove the validation of the
header size. This will allow us to add more fields to the
lkd_fw_comms_desc structure.
Instead of the validation of the header size, we just print warning
when some mismatch in descriptor has been revealed, and we calculate
the CRC base on descriptor size reported by the firmware instead of
calculating it ourselves.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
User will provide a nonce via the ioctl, and will retrieve
secured attestation data of the boot, generated using given
nonce.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Firmware now responds with a more detailed cpucp return codes.
Driver can now distinguish between error and debug return codes.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
F/W security status might change after every reset.
Add the reading of the preboot status to the hard reset sequence, which
among others reads this security indication.
As this preboot status reading includes the waiting for the preboot to
be ready, it can be removed from the CPU init which is done in a later
stage.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As part of the RAS that is done by the f/w, we should send a message
to the f/w when a user either acquires or releases the device.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
EEPROM errors reported by firmware are basically warnings and
should not fail the boot process.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is some left-over code from the gaudi2 bring-up that wasn't
removed so far.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
On Gaudi2 the f/w always configures the PCIe iATU and allows access to
scratchpad registers. Therefore, we can know if the f/w is secured
by reading a status bit from the f/w registers.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Cosmetic commit, no logical changes. It just fixes the spelling
mistakes.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Doing compute reset can be the traditional inference soft reset
that is supported only in Goya.
Or it can be the new reset upon device release, which is supported
in Gaudi2 and above.
Therefore, wherever suitable, use the terminology of compute reset
instead of soft reset.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For gaudi2 we need to send a value to F/W as part of the
PCI_ACCESS packet.
As a preparation, modify hl_fw_send_pci_access_msg() to have a 'value'
field.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
If we send a packet to the f/w, and that packet is unsupported, we
want to be able to identify this situation and possibly ignore this.
Therefore, if the f/w returned an error, we need to propagate it
to the callers in the result value, if those callers were interested
in it.
In addition, no point of printing the error code here because each
caller prints its own error with a specific message.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We need this property for backward compatibility against the f/w.
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Currently we are not waiting for preboot ready after hard reset.
This leads to a race in which COMMs protocol begins but will get no
response from the f/w.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.
It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.
It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This is a pre-requisite patch for adding tracepoints to the DMA memory
operations (allocation/free) in the driver.
The main purpose is to be able to cross data with the map operations and
determine whether memory violation occurred, for example free DMA
allocation before unmapping it from device memory.
To achieve this the DMA alloc/free code flows were refactored so that a
single DMA tracepoint will catch many flows.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For easier debug, it is desirable to have a simple way
to know whether the device is secured or not, hence we dump this
indication during boot.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
When sending a packet to FW right after it made reset, we will get
packet timeout. Since it is expected behavior, we don't need to
print an error in such case.
Hence, when driver is in hard reset it will avoid from printing error
messages about packet timeout.
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Currently we're using the same poll interval value for both
COMMs protocol(for sending a command and waits for an ACK)
and the device CPU boot phases status waits.
On COMMs protocol this interval should be much lower than the
device CPU boot which may take long time to change status.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When Gaudi device is secured the monitors data in the configuration
space is blocked from PCI access.
As we need to enable user to get sync-manager monitors registers when
debugging, this patch adds a debugfs that dumps the information to a
binary file (blob).
When a root user will trigger the dump, the driver will send request to
the f/w to fill a data structure containing dump of all monitors
registers.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need this property for doing backward compatibility hacks against
the f/w.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When parsing firmware versions strings, driver should not
assume a specific length and parse up to the maximum supported
version length.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
During driver and F/W handshake, driver waits for F/W to reach
certain states in order to progress with the boot flow.
Some of the states were deprecated a long time ago and were never
present on official firmwares. Therefore, let's remove them from
the handshake process.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The heartbeat thread is active during soft-reset, and it tries to send
messages to CPU-CP core.
Within the soft-reset, in the time window in which the device is marked
as disabled, any CPU-CP command is "silently" skipped and a success
value it returned.
However, in addition to the return value, the heartbeat function also
checks the F/W result, but because no command is sent in this time
window, the result variable won't hold the expected value and we will
have a false heartbeat failure.
To avoid it, modify the "silent" skip to be done only in hard-reset.
The CPU-CP should be able to handle messages during soft-reset.
In addition to the heartbeat problem, this should also solve other
issues in other flows that send messages during soft-reset and use the
F/W result as it w/o being aware to the reset.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add a missing error check in the sysfs show function for max_power.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
If reading PLL info from F/W fails, the PLL info is not set in the
"result" variable, and hence shouldn't be copied to the caller's array.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Setting PLL profile is the same for all ASICs, except for GOYA.
However, because this function is never called from common code, there
is no need to have an asic-specific callback function.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For better maintainability, try to concentrate all the common functions
that communicate with the f/w in firmware_if.c
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The two remaining functions in this file belong to firmware_if.c,
as they communicate with the firmware.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Unify variables related to device reset, which will help us to
add some new reset functionality in future patches.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In some cases the driver cannot configure ASID of some engines due to
the security level of the relevant registers.
For this a new CPU-CP packet is introduced, which will allow the driver
to ask the F/W to do this configuration instead.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As BTL can be replaced by ROM we should modify relevant error print.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to increase cpucp messaging reliability we will add
the current PI value to the descriptor sent to F/W.
F/W will wait for the PI value as an indication of a valid packet.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Reporting FW errors involves reading of the error registers.
In case we have a corrupted FW descriptor we cannot do that since the
dynamic scratchpad is potentially corrupted as well and may cause kernel
crush when attempting access to a corrupted register offset.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In new f/w versions, it is required to explicitly indicate the power
information type when querying the F/W for power info.
When getting the current power level it should be set to power_input.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As device boot warnings clears the indication from the error mask,
they must be located together before the unknown error validation.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
AS TPM error indication is not fatal, driver should dump a warning
and continue booting.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add implementation for new opcodes in the INFO IOCTL:
1. Retrieve the replaced DRAM rows from f/w.
2. Retrieve the pending DRAM rows from f/w.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Once we read indication of whether f/w is doing the reset, we don't
want to clear it, until the next time we read this indication.
Otherwise, we might be in a state of wrong indication.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Using a variable poll interval for fw loading allows us to support
much slower environments (emulation) while changing only a single
line in the code, instead of choosing a different interval in each
function that polls.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Up until now the driver stored indication if Linux was loaded on the
device CPU. This was needed in order to coordinate some tasks that are
performed by the Linux.
In future ASICs, many of those tasks will be performed by the boot
fit, so now we need the same indication of boot fit load status.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The boot status flag "SRAM available" can be set by f/w Linux (in the
general case) or by f/w uboot (in some specific debug scenario) but
never by f/w preboot.
Hence, when polling the boot status flags in the preboot stage we do not
want to poll on "SRAM Avialable".
The special case in which uboot set this flag is when we are running
special debug scenario without Linux. In this case, at some point during
the boot, the uboot relocates its code to the DRAM and then set the
specified flag.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In the dynamic FW load protocol the boot status is updated to
"Ready to Boot" once uboot is active.
Polling on other boot status values is a residue of code duplication
from the static protocol and should be removed.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Instead of having dedicated function per message that we want to send
to the firmware in COMMS protocol, have a generic function that we can
call to from other parts of the driver
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Various f/w versions have different timeouts, so increase the default
timeout to accommodate all the options.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For some ASICs, the f/w reads the msg_to_cpu_reg value after
reset, and for some it doesn't.
Therefore, to be sure f/w doesn't read a wrong value after reset, we
need to clear this register before the reset occurs.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add the server type property to the hl_info_hw_ip_info structure
that is exposed to the user via the INFO IOCTL.
This is needed by the userspace s/w stack to know the connections map
of the internal links that connect the ASIC among themselves inside the
server.
The F/W will tell us, as part of the NIC information, the server type
that the GAUDI is located in.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is a scenario where an ongoing soft reset would race with an
ongoing heartbeat routine, eventually causing heartbeat to fail and
thus to escalate into a hard reset.
With this fix, soft-reset procedure will disable heartbeat CPU messages
and flush the (ongoing) current one before continuing with reset code.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Update recent changes made in firmware header files, which contain
a minor COMMS protocol change and new error status definitions.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is code related to hard-reset, which is done in gaudi specific
code. However, this code can be used by future ASICs and therefore it
is better to move it to the common code section.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|