Age | Commit message (Collapse) | Author | Files | Lines |
|
There are no internal queues if H/W queues are being used.
In this case we can skip the redundant traversal over the queues array,
looking for internal queues.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Slightly refactor the cs_do_release() function, to reduce nesting level
and to ease the handling of future CS types.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Refactor the CS IOCTL handling by gathering common code into
sub-functions, in order to ease future additions of new CS types.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Once FW security is enabled there is no access to PLL registers,
need to read values from FW using a dedicated interface.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This commit refactors the MMU code to support PCI MMU page tables
residing on host and DCORE MMU residing on the device DRAM at the
same time.
This is needed for future devices as on GAUDI and GOYA we have
a single MMU where its page tables always reside on DRAM.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This commit fixes an incorrect error message
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In cases of multi-tenants, administrators may want to prevent data
leakage between users running on the same device one after another.
To do that the driver can scrub the internal memory (both SRAM and
DRAM) after a user finish to use the memory.
Because in GAUDI the driver allows only one application to use the
device at a time, it can scrub the memory when user app close FD.
In future devices where we have MMU on the DRAM, we can scrub the DRAM
memory with a finer granularity (page granularity) when the user
allocates the memory.
This feature is not supported in Goya.
To allow users that want to debug their applications, we add a kernel
module parameter to load the driver with this feature disabled.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add support for fetching security indication from FW.
This indication is needed in order to skip unnecessary
initializations done by FW.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Fix cs counters structure in uapi to be one flat structure instead
of two instances of the same other structure.
use atomic read/increment for context counters so we could use
one structure for both aggregated and context counters.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Today driver is able to load a whole FW binary into a specific
location on ASIC. We add support for loading sections from the
same FW binary into different loactions.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
GCC 7.3.1 20180303 (Red Hat 7.3.1-5) complains that collective_engine_id
might be used uninitialized.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Although we get a valid cs type from the callee, in case new values
will be added in the future, it is best to check the expected values
in that function.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In GAUDI we don't have an MMU towards the HBM device memory. Therefore,
the user access that memory directly through physical address (via the
different engines) without the need to go through the driver to
allocate/free memory on the HBM.
For system monitoring purposes, the driver will keep track of the HBM
usage. This can be done as long as the user accurately reports the
allocations and releases of HBM memory, through the existing MEMORY
IOCTL uapi.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Implement sync stream collective for GAUDI. Need to allocate additional
resources for that and add ctx_fini() to clean up those resources.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Define new API for collective wait support and modify sync stream
common flow. In addition add kernel CB allocation support for
internal queues.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In the future there will be situations where queues can accept either
kernel allocated CBs or user allocated CBs, depending on different
states.
Therefore, instead of using a boolean variable of kernel/user allocated
CB, we need to use a bitmask to indicate that, which will allow to
combine the two options.
Add a flag to the uapi so the user will be able to indicate whether
the CB was allocated by kernel or by user. Of course the driver
validates that.
Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Initialize the QMANs that are responsible to submit doorbells to the NIC
engines. Add support for stopping and disabling them, and reset them as
part of the hard-reset procedure of GAUDI. This will allow the user to
submit work to the NICs.
Add support for receiving events on QMAN errors from the firmware.
However, the nic_ports_mask is still initialized to 0. That means this code
won't initialize the QMANs just yet. That will be in a later patch.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We already check if queue index is smaller than max queues a few lines
above this check so no need to check this again.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Refactor sync stream implementation by reducing function length
for better readability.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Support advanced monitor functionality to monitor more than a
single SOB. In addition expand all CB generation functions
with buffer offset in order to put in them multiple packets that are
generated by different functions.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Refactor sync stream implementation by adding more structures for
better readability. In addition reducing allocated resources.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case we are running without MMU enabled (debug mode), no need to
initialize the VM module in the driver.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The driver now loads the firmware in two stages. For debugging purposes
we need to support situations where only the first stage firmware is
loaded.
Therefore, use a bitmask to determine which F/W is loaded
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
F/W can be loaded but device CPU queues disabled. In that case, HWMON
should be disabled. This is only relevant when debugging
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case we will have multiple contexts/processes, we can't just
increment aggregated counters. We need to make them atomic as they can
be incremented by multiple processes
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Driver never puts its device and control_device objects, hence
a memory leak is introduced every driver removal.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
If huge range is not valid, driver uses the host range also for
huge page allocations, but driver never frees its allocation.
This introduces a memory leak every time a user closes its context.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
All throughout the driver, normal kernel pointers are
stored as 'u64' struct members, which is kind of silly
and requires casting through a uintptr_t to void* every
time they are used.
There is one line that missed the intermediate uintptr_t
case, which leads to a compiler warning:
drivers/misc/habanalabs/common/command_buffer.c: In function 'hl_cb_mmap':
drivers/misc/habanalabs/common/command_buffer.c:512:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
512 | rc = hdev->asic_funcs->cb_mmap(hdev, vma, (void *) cb->kernel_address,
Rather than adding one more cast, just fix the type and
remove all the other casts.
Fixes: 0db575350cb1 ("habanalabs: make use of dma_mmap_coherent")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The device should be idle after a context is closed. If not, print a
notice.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
During debugging of error we sometimes need to know whether the error
happened when a user context was open. Add debug prints when opening and
closing user contexts.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Some engines use resources that belong to the kernel context (e.g. MMU
mappings). In case the halt-engines doesn't work properly due to H/W
restriction, we need to make sure the kernel context lives on until after
the hw_fini. The hw_fini resets the ASIC after that no engine is alive and
we can safely close the kernel context.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
We don't try to allocate huge pages here so remove the huge word.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Future F/W versions will have enhanced security measures and the driver
won't be able to do certain configurations that it always did and those
configurations will be done by the firmware.
We use the firmware's preboot version to determine whether security
measures are enabled or not. Because we need this very early in our code,
the read of the preboot version is moved to the earliest possible place,
right after the device's PCI initialization.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
There are cases in which the device should access the host memory of a
CB through the device MMU, and thus this memory should be mapped.
The patch adds a flag to the CB IOCTL, in which a user can ask the
driver to perform the mapping when creating a CB.
The mapping is allowed only if a dedicated VA range was allocated for
the specific ASIC.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Future changes require using a context while handling a command buffer,
and thus need to save the context in the command buffer object.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The user sometimes wants to check if a CS has completed to clean resources.
In that case, the user doesn't want to sleep but just to check if the CS
has finished and continue with his code.
Add a new definition to the API of the wait on CS. The new definition says
that if the timeout is 0, the driver won't sleep at all but return
immediately after checking if the CS has finished.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
This commit modify the existing debugfs code to support future devices that
have a 6 HOPs MMU implementation instead of 5 HOPs implementation.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
This commit adds the number of HOPs supported by the device to the
device MMU properties.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
As preparation to MMU v2, rework MMU to be device oriented
instantiated according to the device in hand.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
In the future we will have MMU v2 code, so we need to prepare the
driver for it. The first step is to rename the current MMU file to
mmu_v1.c.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Change the acquiring of a device virtual address for mapping by using the
smallest possible alignment, rather than the biggest, depending on the
page size used by the user for allocating the memory. This will lower the
virtual space memory consumption.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Old function pointer that was left when the call to this function pointer
was removed.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
There were a couple of comments where the name ArmCP was still used. Rename
it to CPU-CP.
In addition, rename ArmCP or ARM in log messages to "device CPU".
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
There is a case where the user reaches the maximum number of CS in-flight.
In that case, the driver rejects the new CS of the user with EAGAIN. Count
that event so the user can query the driver later to see if it happened.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Add dma_mmap_coherent() for goya and gaudi to match their use of
dma_alloc_coherent(), see the Link tag for why.
Link: https://lore.kernel.org/lkml/20200609091727.GA23814@lst.de/
Cc: Christoph Hellwig <hch@lst.de>
Cc: Zhang Li <li.zhang@bitmain.com>
Cc: Ding Z Nan <oshack@hotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The driver use vm_pgoff to hold the CB idr handle. Before we actually call
the mapping function, we need to clear the handle so there won't be any
garbage left in vm_pgoff.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Arrange the hl_mmap code to be more structured and expandable for the
future. Add better defines that describe our usage of the vm_pgoff.
Note that I shamelessly took the code and defines from the amdkfd driver
(my previous driver).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
ArmCP mandates that the device CPU is always an ARM processor, which might
be wrong in the future.
Most of this change is an internal renaming of variables, functions and
defines but there are two entries in sysfs which have armcp in their
names. Add identical cpucp entries but don't remove yet the armcp entries.
Those will be deprecated next year. Add the documentation about it in sysfs
documentation.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Add driver implementation for reading the total energy consumption
from the device ARM FW.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Include linux/bitfield.h only in habanalabs.h, instead of in each and
every file that needs it, as habanalabs.h is already included by all.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|