summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs
AgeCommit message (Collapse)AuthorFilesLines
2022-07-12habanalabs: move h/w dirty message to debugOded Gabbay3-5/+3
H/W being dirty during initialization is completely expected in case f/w tools are used before loading the driver. As it is not an error, and as it doesn't give any meaningful information to the user, no point of printing it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: rename soft reset to compute resetOded Gabbay8-30/+30
Doing compute reset can be the traditional inference soft reset that is supported only in Goya. Or it can be the new reset upon device release, which is supported in Gaudi2 and above. Therefore, wherever suitable, use the terminology of compute reset instead of soft reset. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add status of reset after device releaseOded Gabbay2-7/+16
The user might want to know the device is in reset after device release, which is not an erroneous event as a regular reset. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: fix update of is_in_soft_resetOded Gabbay1-5/+15
reset_info.is_in_soft_reset should be updated both before in_reset and inside the spin lock of the reset info structure. The reasons are: - When we are inside soft reset, it implies we are in reset. Therefore, if someone checks if we are in soft reset, he can deduce we are in reset, while the opposite is not correct and might be misleading. - Both these flags are changed together so they must be changed inside the reset info spinlock. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: expose only valid debugfs nodesOfir Bitton1-44/+50
In case security is enabled on the device, some debugfs nodes will fail. Hence, we do not expose them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: map virtual MSI-X doorbell memory for userTomer Tayar2-2/+44
Upon the initialization of a user context, map the host memory page of the virtual MSI-X doorbell in the device MMU. A reserved VA is used for this purpose, so user can use it directly without any allocation/map operation. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: modify decoder to use virtual MSI-X doorbellTomer Tayar4-11/+161
Modify the decoder wrapper blocks to generate interrupts using the virtual MSI-X doorbell. As a decoder wrapper block cannot write directly to HBW upon completion, it writes instead to SOB which is monitored by a master monitor. When resolved, this monitor will be the one to actually write to the virtual MSI-X doorbell. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: modify CS completion CQ to use virtual MSI-X doorbellTomer Tayar1-3/+16
Modify the CQ which is used for CS completion, to use the virtual MSI-X doorbell. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: replace defines for reserved sob/mob with enumsTomer Tayar3-32/+46
Following patches are going to add more reserved sync objects and monitors. To make the counting of these reserved resources simpler, replace the existing RESERVED_* defines with enumerations. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: configure virtual MSI-X doorbell interfaceTomer Tayar3-3/+28
Due to a watchdog timer in the LBW path, writes to the MSI-X doorbell can return sporadic error responses. To work-around this issue, a virtual MSI-X doorbell on the HBW path is configured, using the MSI-X AXI slave interface in the PCIe controller. Upon an access to a configured HBW host address, the controller will generate MSI-X interrupt instead of treating the access as regular host memory access. This patch allocates the dedicate host memory page, and communicate the address to F/W, so it will configure the relevant address match registers in the controller, and will use this address to generate MSI-X interrupts for F/W events. Following patches will handle other initiators in the device, to move them to use the virtual MSI-X doorbell. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add a value field to hl_fw_send_pci_access_msg()Tomer Tayar6-15/+14
For gaudi2 we need to send a value to F/W as part of the PCI_ACCESS packet. As a preparation, modify hl_fw_send_pci_access_msg() to have a 'value' field. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: fixes to the poll-timeout macrosOhad Sharabi1-29/+90
- use conventional internal macro variables (double underscore prefix) - adjust address casting - on register poll using ELBI use ELBI read rather than BAR read on error condition - remove unused macro Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: use DIV_ROUND_UP_SECTOR_T instead of roundupOhad Sharabi1-4/+5
roundup will create an error in 32-bit architectures as we use 64-bit variables. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: initialize variable explicitlyOded Gabbay1-1/+1
Fix warning of "warning: ‘old_base’ may be used uninitialized in this function" Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: Use the bitmap API to allocate bitmapsChristophe JAILLET1-3/+2
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: remove unused definesOded Gabbay3-13/+0
There were some defines that are unused in the current upstreamed code. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: make sure variable is set before usedOded Gabbay1-1/+1
timestamp could be unset in both _hl_interrupt_wait_ioctl() and _hl_interrupt_wait_ioctl_user_addr() so it is better to explicitly initialize it to 0 when declaring it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: don't declare tmp twice in same functionOded Gabbay1-2/+2
tmp is declared in the scope of the function cs_do_release() and inside a block inside that function. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: do not set max power on a secured deviceOfir Bitton1-2/+4
Max power API is not supported in secured devices. Hence, we should skip setting it during boot. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: SM mask can only be 8-bitOded Gabbay1-1/+2
Otherwise, due to how we calculate it, we might fail in FIELD_PREP checks. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: remove unused variableOded Gabbay1-7/+3
glbl_sts_clr_val was set but never used Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: allow detection of unsupported f/w packetsOded Gabbay1-4/+8
If we send a packet to the f/w, and that packet is unsupported, we want to be able to identify this situation and possibly ignore this. Therefore, if the f/w returned an error, we need to propagate it to the callers in the result value, if those callers were interested in it. In addition, no point of printing the error code here because each caller prints its own error with a specific message. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: save f/w preboot minor versionSagiv Ozeri2-9/+44
We need this property for backward compatibility against the f/w. Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add support for common decoder interruptsOfir Bitton4-6/+15
User application should be able to get notification for any decoder completion. Hence, we introduce a new interface in which a user can wait for all current decoder pending interrupts. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: naming refactor of user interrupt flowOfir Bitton4-16/+16
Current naming convention can be misleading. Hence renaming some variables and defines in order to be more explicit. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: wait for preboot ready after hard resetOhad Sharabi5-42/+107
Currently we are not waiting for preboot ready after hard reset. This leads to a race in which COMMs protocol begins but will get no response from the f/w. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: reset device upon critical ECC eventOfir Bitton2-11/+16
Correctable ECC events are not fatal, but as they accumulate, the f/w can decide that a hard-rest is required. This indication is propagated to the host using the existing ECC event interface. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: enable gaudi2 code in driverOded Gabbay3-7/+63
Enable the Gaudi2 ASIC code in the pci probe callback of the driver so the driver will handle Gaudi2 ASICs. Add the PCI ID to the PCI table and add the ASIC enum value to all relevant places. Fixup the device parameters initialization for Gaudi2. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add gaudi2 MMU supportMoti Haimovski7-33/+1064
Gaudi2 has new MMU units. A PMMU for device->host accesses, and HMMU for HBM accesses. The page tables of both MMUs are located in the host's memory (referred to in the code as host-resident pgt). Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add gaudi2 wait-for-CS supportOded Gabbay7-89/+210
In Gaudi2 we moved to a different wait for command submission completion model. Instead of receiving interrupt only on external queues, we use the device's sync manager to notify us when the entire command submission finishes. This enables us to remove the categorization of queues to external and internal, and treat each queue equally, without the need to parse and patch any command buffer. This change also requires refactoring to the IRQ handling of CS completions. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: add gaudi2 profiler moduleBenjamin Dotan4-2/+3792
Add the Gaudi2 code to initialize the ASIC's profiler. The profile receives its initialization values from the user, same as in Gaudi2, but the code to initialize is in the driver because the configuration space of the device is not directly exposed to the user. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: add gaudi2 security moduleOfir Bitton4-2/+3867
Use the generic security module to block all registers in the ASIC and then open only those that are needed to be accessed by the user. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add generic security moduleOfir Bitton3-2/+671
As the ASICs become more complex and have many more registers, we need a better way to configure the security properties. As a reminder, we have two dedicated mechanisms for security: Range Registers and Protection bits. Those mechanisms protect sensitive memory and configuration areas inside the device. The generic module handles the low-level part of the configuration, because the configuration mechanism is identical in all ASICs. The difference is the address ranges and register names. Any ASIC that use this block should first block all the register blocks in the ASIC. Then, it should open only the registers that need to be accessed by the user (This is opposed to Goya and Gaudi, where we blocked only what should not be accesses by the user). The module contains several functions, to unblock single register, multiple registers, entire blocks, ranges, ranges with mask. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: remove obsolete device variables used for testingOded Gabbay5-173/+24
There are a couple of device variables that are used for testing purposes and they are set to fixed values. Remove the variables that are not relevant anymore and document the remaining variables. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: initialize new asic propertiesOded Gabbay3-14/+28
New asic properties were added for Gaudi2. We want to initialize and use them, when relevant, also for Goya and Gaudi. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add unsupported functionsOded Gabbay2-0/+42
There are a number of new ASIC-specific functions that were added for Gaudi2. To make the common code work, we need to define empty implementations of those functions for Goya and Gaudi. Some functions will return error if called with Goya/Gaudi. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add gaudi2 asic-specific codeOded Gabbay22-97/+11155
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the function callbacks that the driver's common code need to initialize, finalize and submit workloads to the Gaudi2 ASIC. It also contains the code to initialize the F/W of the Gaudi2 ASIC and to receive events from the F/W. It contains new debugfs entry to dump razwi events. razwi is a case where the device's engines create a transaction that reaches an invalid destination. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi2: add asic registers header filesOded Gabbay168-2/+136492
Add the relevant GAUDI2 ASIC registers header files. These files are generated automatically from a tool maintained by the VLSI engineers. There are more files which are not upstreamed because only very few defines from those files are used in the driver. For those files, I copied the relevant defines into gaudi2_regs.h and gaudi2_masks.h, to reduce the size of this patch. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: remove redundant argument in access_dev_mem APIsOfir Bitton3-9/+7
Region structure is derived from region type, hence no need to pass it as an argument. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: use %pa to print pci bar sizeOded Gabbay2-28/+22
PCI bar size is resource_size_t so we should use %pa to make it work correctly on all architectures. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi: replace hl_poll_timeout with while loopDafna Hirschfeld1-12/+11
in gaudi_scrub_device_mem, replace call to hl_poll_timeout with a while loop to avoid using dummy variables. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: communicate supported page sizes to userOhad Sharabi5-19/+6
Because in future ASICs the driver will allow the user to set the page size we need to make sure this data is propagated in all APIs. In addition, since this is already an ASIC property we no longer need ASIC function for it. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: remove dead code from free_device_memory()Tomer Tayar1-28/+22
free_device_memory() ends with if and else, each has a return statement, followed by another return statement that can never be reached. Restructure the function and remove this dead code. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi: enable error interrupt on ARB WDTOded Gabbay1-0/+1
We want to receive an error interrupt in case the watchdog timer expires on arbitration event in the queues. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: page size can only be a power of 2Ohad Sharabi4-7/+2
We dropped support for page sizes that are not power of 2. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: refactor dma asic-specific functionsOhad Sharabi8-152/+162
This is a pre-requisite patch for adding tracepoints to the DMA memory operations (allocation/free) in the driver. The main purpose is to be able to cross data with the map operations and determine whether memory violation occurred, for example free DMA allocation before unmapping it from device memory. To achieve this the DMA alloc/free code flows were refactored so that a single DMA tracepoint will catch many flows. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi: remove unused enumOded Gabbay1-22/+9
Also beautify code by preferring single line wherever possible. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi: mask constant value before castOded Gabbay1-4/+4
This fixes a sparse warning of "cast truncates bits from constant value" Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi: use correct type in assignmentOded Gabbay1-1/+1
packets are defined as LE so we need to convert before assigning values to them. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs/gaudi: fix function name in commentOded Gabbay1-1/+1
function name in comment didn't match actual function name. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>