summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/common/habanalabs_ioctl.c
AgeCommit message (Collapse)AuthorFilesLines
2023-01-26habanalabs: move driver to accel subsystemOded Gabbay1-1190/+0
Now that we have a subsystem for compute accelerators, move the habanalabs driver to it. This patch only moves the files and fixes the Makefiles. Future patches will change the existing code to register to the accel subsystem and expose the accel device char files instead of the habanalabs device char files. Update the MAINTAINERS file to reflect this change. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26habanalabs/uapi: move uapi file to drmOded Gabbay1-1/+1
Move the habanalabs.h uapi file from include/uapi/misc to include/uapi/drm, and rename it to habanalabs_accel.h. This is required before moving the actual driver to the accel subsystem. Update MAINTAINERS file accordingly. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26habanalabs: pass-through request from user to f/wfarah kassabri1-0/+51
Add a uAPI, as part of the INFO IOCTL, to allow users to send requests directly to f/w, according to a pre-defined set of opcodes that the f/w exposes. The f/w will put the result in a kernel-allocated buffer, which the driver will then copy to the user-supplied buffer. This will allow f/w tools to communicate directly with the f/w without the need to add a new uAPI to the driver for each new type of request. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23habanalabs: skip events info ioctl if not supportedOhad Sharabi1-0/+4
Some ASICs haven't yet implemented this functionality and so the ioctl call should fail and the user should be notified of the reason. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23habanalabs/gaudi2: add PCI revision 2 supportOfir Bitton1-2/+4
Add support for Gaudi2 Device with PCI revision 2. Functionality is exactly the same as revision 1, the only difference is device name exposed to user. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23habanalabs: allow unregistering eventfd when device non-operationalTomer Tayar1-3/+3
Unregistering eventfd is for releasing host resources and doesn't involve an access to the device. As such, there is no reason to disallow it when device isn't operational. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23habanalabs: add page fault info uapiDani Liberman1-0/+42
Only the first page fault will be saved. Besides the address which caused the page fault, the driver captures all of the mmu user mappings. User can retrieve this data via the new uapi (new opcode in INFO ioctl). Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23habanalabs: refactor razwi event notificationDani Liberman1-9/+3
This event notification was compatible only with gaudi, where razwi and page fault happens together. To make it compatible with all ASICs, this refactor contains: 1. Razwi notification will only notify about razwi info. New notification will be added in future patch, to retrieve data about page fault error. 2. Changed razwi info structure to support all ASICs. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19habanalabs/gaudi2: add secured attestation info uapiDani Liberman1-0/+52
User will provide a nonce via the ioctl, and will retrieve secured attestation data of the boot, generated using given nonce. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19habanalabs: rename error info structureDani Liberman1-15/+15
As a preparation for adding more errors to it, change to more suitable name. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: expose device security status using info ioctlOfir Bitton1-0/+1
In order for the user to know if he is running on a secured device or not, we add it also to the hw_ip info ioctl. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: add uapi to retrieve engines statusDani Liberman1-0/+40
Currently, to get engines status, user needed to read debugfs file with root permissions. This new uapi allows user apace apps retrieve status, so for example, in case of failure, status can be retrieved immediately by the application itself which runs without root permissions. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: rename soft reset to compute resetOded Gabbay1-1/+1
Doing compute reset can be the traditional inference soft reset that is supported only in Goya. Or it can be the new reset upon device release, which is supported in Gaudi2 and above. Therefore, wherever suitable, use the terminology of compute reset instead of soft reset. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add gaudi2 MMU supportMoti Haimovski1-1/+1
Gaudi2 has new MMU units. A PMMU for device->host accesses, and HMMU for HBM accesses. The page tables of both MMUs are located in the host's memory (referred to in the code as host-resident pgt). Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: initialize new asic propertiesOded Gabbay1-7/+14
New asic properties were added for Gaudi2. We want to initialize and use them, when relevant, also for Goya and Gaudi. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: add gaudi2 asic-specific codeOded Gabbay1-1/+1
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the function callbacks that the driver's common code need to initialize, finalize and submit workloads to the Gaudi2 ASIC. It also contains the code to initialize the F/W of the Gaudi2 ASIC and to receive events from the F/W. It contains new debugfs entry to dump razwi events. razwi is a case where the device's engines create a transaction that reaches an invalid destination. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: communicate supported page sizes to userOhad Sharabi1-1/+1
Because in future ASICs the driver will allow the user to set the page size we need to make sure this data is propagated in all APIs. In addition, since this is already an ASIC property we no longer need ASIC function for it. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12habanalabs: expose undefined opcode status via info ioctlTal Cohen1-0/+25
The info ioctl retrieves information on the last undefined opcode occurred. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-05-22habanalabs: use separate structure info for each error collect dataTal Cohen1-9/+9
Create separate info structure for each error type. The structures shall be used inside the large structure that contains the last session error. This is more scalable for adding more errors in the future. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22habanalabs: return -EFAULT on copy_to_user errorOded Gabbay1-3/+1
If copy_to_user failed in info ioctl, we always return -EFAULT so the user will know there was an error. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22habanalabs: use NULL for eventfdOded Gabbay1-2/+2
eventfd is pointer. As such, it should be initialized to NULL, not to 0. In addition, no need to initialize it after creation because the entire structure is zeroed-out. Also, no need to initialize it before release because the entire structure is freed. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22habanalabs: add support for notification via eventfdTal Cohen1-0/+65
The driver will be able to send notification events towards a user process, using user's registered event file descriptor. The driver uses the notification mechanism to inform the user about an occurred event. A user thread can wait until a notification is received from the driver. The driver stores the occurred event until the user reads it, using HL_INFO_GET_EVENTS - new ioctl opcode in the INFO ioctl. Gaudi specific implementation includes sending a notification on a TPC assertion event that is received from f/w. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22habanalabs: expose compute ctx status through info ioctlOfir Bitton1-0/+2
In order for the user to know if he can try and open device, we expose the compute ctx state. The user can now know if the context is used by another process or whether the device is still ongoing through cleanup or reset and will be available soon. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22habanalabs: add user API to get valid DRAM page sizesOhad Sharabi1-0/+24
Future devices will support multiple device memory page sizes. In addition, an API for the user was added for it to be able to control the device memory allocation page size. This patch is a complementary patch to inform the user of the available page size supported by the device. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22habanalabs: add DRAM default page size to HW infoOhad Sharabi1-0/+1
When using the device memory allocation API the user ought to know what is the default allocation page size. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-28habanalabs: expose number of user interruptsOded Gabbay1-2/+2
Currently we only expose to the user the ID of the first available user interrupt. To make user interrupts allocation truly dynamic, we need to also expose the number of user interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: remove hwmgr.cOded Gabbay1-1/+1
The two remaining functions in this file belong to firmware_if.c, as they communicate with the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28habanalabs: get clk is common functionOded Gabbay1-5/+4
Retrieving the clock from the f/w is done exactly the same in ALL our ASICs. Therefore, no real justification for doing it as an ASIC-specific function. The only thing is we need to check if we are running on simulator, which doesn't require ASIC-specific callback. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: refactor reset information variablesOfir Bitton1-2/+2
Unify variables related to device reset, which will help us to add some new reset functionality in future patches. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: replace some -ENOTTY with -EINVALOded Gabbay1-2/+2
-ENOTTY is returned in case of error in the ioctl arguments themselves, such as function that doesn't exists. In all other cases, where the error is in the arguments of the custom data structures that we define that are passed in the various ioctls, we need to return -EINVAL. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: return correct clock throttling periodOfir Bitton1-2/+2
Current clock throttling period returned from driver was wrong due to wrong time comparison. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: fix etr asid configurationOded Gabbay1-6/+7
Pass the user's context pointer into the etr configuration function to extract its ASID. Using the compute_ctx pointer is an error as it is just an indication of whether a user has opened the compute device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: enable access to info ioctl during hard resetDani Liberman1-7/+0
Because info ioctl is used to retrieve data, some of its opcodes may be used during hard reset. Other ioctls should be blocked while device is not operational. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: add more info ioctls support during resetOfir Bitton1-28/+27
Some info ioctls can be served even if the device is disabled or in reset. Hence, we enable more info ioctls during reset, as these ioctls do not require any H/W nor F/W communication. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: add support for fetching historic errorsDani Liberman1-0/+60
A new uAPI is added for debug purposes of the user-space to retrieve errors related data from previous session (before device reset was performed). Inforamtion is filled when a razwi or CS timeout happens and can contain one of the following: 1. Retrieve timestamp of last time the device was opened and razwi or CS timeout happened. 2. Retrieve information about last CS timeout. 3. Retrieve information about last razwi error. This information doesn't contain user data, so no danger of data leakage between users. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: add new opcodes for INFO IOCTLfarah kassabri1-0/+43
Add implementation for new opcodes in the INFO IOCTL: 1. Retrieve the replaced DRAM rows from f/w. 2. Retrieve the pending DRAM rows from f/w. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26habanalabs: expand clock throttling information uAPIOfir Bitton1-2/+25
In addition to the clock throttling reason, user should be able to obtain also the start time and the duration of the throttling event. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01habanalabs: expose server type in INFO IOCTLOded Gabbay1-0/+2
Add the server type property to the hl_info_hw_ip_info structure that is exposed to the user via the INFO IOCTL. This is needed by the userspace s/w stack to know the connections map of the internal links that connect the ASIC among themselves inside the server. The F/W will tell us, as part of the NIC information, the server type that the GAUDI is located in. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: added open_stats info ioctlYuri Nudelman1-0/+21
In a system with multiple ASICs, there is a need to provide monitoring tools with information on how long a device was opened and how many times a device was opened. Therefore, we add a new opcode to the INFO ioctl to provide that information. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: add missing space after castingOmer Shpigelman1-1/+1
Change casting code according to kernel coding style. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: improve utilization calculationKoby Elbaz1-8/+3
The new approach is based on the notion that the relative current power consumption is in relation of proportionality to device's true utilization. Utilization info ranges between [0,100]% Currently, dc_power values are hard-coded. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: return current power via INFO IOCTLSagiv Ozeri1-0/+22
Add driver implementation for reading the current power from the device CPU F/W. Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: wait for interrupt supportOfir Bitton1-1/+1
In order to support command submissions from user space, the driver need to add support for user interrupt completions. The driver will allow multiple user threads to wait for an interrupt and perform a comparison with a given user address once interrupt expires. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-03-10habanalabs: Disable file operations after device is removedTomer Tayar1-0/+12
A device can be removed from the PCI subsystem while a process holds the file descriptor opened. In such a case, the driver attempts to kill the process, but as it is still possible that the process will be alive after this step, the device removal will complete, and we will end up with a process object that points to a device object which was already released. To prevent the usage of this released device object, disable the following file operations for this process object, and avoid the cleanup steps when the file descriptor is eventually closed. The latter is just a best effort, as memory leak will occur. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-02-08habanalabs: support fetching first available user CQOfir Bitton1-1/+2
User must be aware of the available CQs when it needs to use them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: modify device_idle interfaceOhad Sharabi1-2/+3
Currently this API uses single 64 bits mask for engines idle indication. Recently, it was observed that more bits are needed for some ASICs. This patch modifies the use of the idle mask and the idle_extensions mask. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add user available interrupt to hw_ipOfir Bitton1-0/+2
In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: report correct dram size in info ioctlOfir Bitton1-1/+9
In case MMU is enabled, we must take MMU page size into consideration when reporting dram size to the user. This is because the MMU page size can be a value which is NOT a power-of-2 value. As a result, the total DRAM size (which is always a power-of-2 value) needed to be rounded-down. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: return dram virtual address in info ioctlAlon Mizrahi1-1/+3
When working with DRAM MMU, we should supply the userspace with the virtual start address of the DRAM instead of the physical one. This is because the physical one has no meaning for the user as he only knows the virtual address range. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: report dram_page_size in hw_ip_info ioctlMoti Haimovski1-0/+1
Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>