summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-04-11Merge tag 'mhi-for-v5.13' of ↵Greg Kroah-Hartman8-276/+775
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI changes for v5.13 core: - Added support for Flash Programmer execution environment which allows the host machine (like x86) to flash the modem firmware to NAND or eMMC in the modem. The MHI bus will expose EDL channels (34, 35) and then the opensource QDL tool [1] can be used to flash the firmware from the host. - Added an internal helper for polling the MHI registers with a retry interval. This helper is used now to poll for the MHI ready state in MHI STATUS register. - Various fixes for issues found during the bringup of SDX24/SDX55 based Quectel and Telit modems. - Updates to the Execution environment handling for proper downloading of the AMSS image from SBL (Secondary Bootloader) mode. - Added support for sending STOP channel command to the MHI device and also made changes to the MHI core for proper handling of stop and restart. - Fixed the runtime_pm handling in the core by forcing the device to be in wake mode until TX completion and allowing it to suspend for RX. - Added sanity checks for values read from the device to avoid crash if those are corrupted somehow. - Fixed warnings generated by sparse (W=2) - Couple of kernel doc cleanups in mhi.h pci_generic: - Added support for runtime PM and generic PM - Added Firehose channels for flashing the firmware - Added support for modems such as Quectel EM1XXGR-L, SDX24, SDX65, Foxconn T99W175 exposing relevant channels. [1] https://git.linaro.org/landing-teams/working/qualcomm/qdl.git * tag 'mhi-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: (49 commits) bus: mhi: fix typo in comments for struct mhi_channel_config bus: mhi: core: Fix shadow declarations bus: mhi: pci_generic: Constify mhi_controller_config struct definitions bus: mhi: pci_generic: Introduce Foxconn T99W175 support bus: mhi: core: Sanity check values from remote device before use bus: mhi: pci_generic: Add FIREHOSE channels bus: mhi: pci_generic: Implement PCI shutdown callback bus: mhi: Improve documentation on channel transfer setup APIs bus: mhi: core: Remove __ prefix for MHI channel unprepare function bus: mhi: core: Check channel execution environment before issuing reset bus: mhi: core: Clear configuration from channel context during reset bus: mhi: core: Hold device wake for channel update commands bus: mhi: core: Update debug messages to use client device bus: mhi: core: Improvements to the channel handling state machine bus: mhi: core: Clear context for stopped channels from remove() bus: mhi: core: Allow sending the STOP channel command bus: mhi: pci_generic: Add SDX65 based modem support bus: mhi: core: Remove pre_init flag used for power purposes bus: mhi: pm: reduce PM state change verbosity bus: mhi: core: Fix MHI runtime_pm behavior ...
2021-04-11Merge tag 'misc-habanalabs-next-2021-04-10' of ↵Greg Kroah-Hartman31-517/+2193
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next Oded writes: This tag contains habanalabs driver changes for v5.13: - Add support to reset device after the user closes the file descriptor. Because we support a single user, we can reset the device (if needs to) after a user closes its file descriptor to make sure the device is in idle and clean state for the next user. - Add a new feature to allow the user to wait on interrupt. This is needed for future ASICs - Replace GFP_ATOMIC with GFP_KERNEL wherever possible and add code to support failure of allocating with GFP_ATOMIC. - Update code to support the latest firmware image: - More security features are done in the firmware - Remove hard-coded assumptions and replace them with values that are sent to the firmware on loading. - Print device unusable error - Reset device in case the communication between driver and firmware gets out of sync. - Support new PCI device ids for secured GAUDI. - Expose current power draw through the INFO IOCTL. - Support resetting the device upon a request from the BMC (through F/W). - Always use only a single MSI in GAUDI, due to H/W limitation. - Improve data-path code by taking out code from spinlock protection. - Allow user to specify custom timeout per Command Submission. - Some enhancements to debugfs. - Various minor changes and improvements. * tag 'misc-habanalabs-next-2021-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (41 commits) habanalabs: print f/w boot unknown error habanalabs: update to latest F/W communication header habanalabs/gaudi: skip iATU if F/W security is enabled habanalabs/gaudi: derive security status from pci id habanalabs: move dram scrub to free sequence habanalabs: send dynamic msi-x indexes to f/w habanalabs/gaudi: clear QM errors only if not in stop_on_err mode habanalabs: support DEVICE_UNUSABLE error indication from FW habanalabs: use strscpy instead of sprintf and strlcpy habanalabs: remove the store jobs array from CS IOCTL habanalabs/gaudi: add debugfs to DMA from the device habanalabs/gaudi: sync stream add protection to SOB reset flow habanalabs: add custom timeout flag per cs habanalabs: improve utilization calculation habanalabs: support legacy and new pll indexes habanalabs: move relevant datapath work outside cs lock habanalabs: avoid soft lockup bug upon mapping error habanalabs/gaudi: Update async events header habanalabs/gaudi: unsecure TPC cfg status registers habanalabs/gaudi: always use single-msi mode ...
2021-04-10fbdev: zero-fill colormap in fbcmap.cPhillip Potter1-4/+4
Use kzalloc() rather than kmalloc() for the dynamically allocated parts of the colormap in fb_alloc_cmap_gfp, to prevent a leak of random kernel data to userspace under certain circumstances. Fixes a KMSAN-found infoleak bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=741578659feabd108ad9e06696f0c1f2e69c4b6e Reported-by: syzbot+47fa9c9c648b765305b9@syzkaller.appspotmail.com Cc: stable <stable@vger.kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20210331220719.1499743-1-phil@philpotter.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10firmware: qcom-scm: Fix QCOM_SCM configurationHe Ying1-0/+1
When CONFIG_QCOM_SCM is y and CONFIG_HAVE_ARM_SMCCC is not set, compiling errors are encountered as follows: drivers/firmware/qcom_scm-smc.o: In function `__scm_smc_do_quirk': qcom_scm-smc.c:(.text+0x36): undefined reference to `__arm_smccc_smc' drivers/firmware/qcom_scm-legacy.o: In function `scm_legacy_call': qcom_scm-legacy.c:(.text+0xe2): undefined reference to `__arm_smccc_smc' drivers/firmware/qcom_scm-legacy.o: In function `scm_legacy_call_atomic': qcom_scm-legacy.c:(.text+0x1f0): undefined reference to `__arm_smccc_smc' Note that __arm_smccc_smc is defined when HAVE_ARM_SMCCC is y. So add dependency on HAVE_ARM_SMCCC in QCOM_SCM configuration. Fixes: 916f743da354 ("firmware: qcom: scm: Move the scm driver to drivers/firmware") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: He Ying <heying24@huawei.com> Link: https://lore.kernel.org/r/20210406094200.60952-1-heying24@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10speakup: i18n: Switch to kmemdup_nul() in spk_msg_set()Yang Yingliang1-3/+1
Use kmemdup_nul() helper instead of open-coding to simplify the code in spk_msg_set(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20210406034434.442251-1-yangyingliang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10w1: ds28e17: Use module_w1_family to simplify the codeChen Huang1-15/+1
module_w1_family() makes the code simpler by eliminating boilerplate code. Signed-off-by: Chen Huang <chenhuang5@huawei.com> Link: https://lore.kernel.org/r/20210408130954.1158963-2-chenhuang5@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10w1: ds2805: Use module_w1_family to simplify the codeChen Huang1-14/+1
module_w1_family() makes the code simpler by eliminating boilerplate code. Signed-off-by: Chen Huang <chenhuang5@huawei.com> Link: https://lore.kernel.org/r/20210408130954.1158963-1-chenhuang5@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10binder: tell userspace to dump current backtrace when detected oneway spammingHang Lu5-8/+56
When async binder buffer got exhausted, some normal oneway transactions will also be discarded and may cause system or application failures. By that time, the binder debug information we dump may not be relevant to the root cause. And this issue is difficult to debug if without the backtrace of the thread sending spam. This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway spamming is detected, request to dump current backtrace. Oneway spamming will be reported only once when exceeding the threshold (target process dips below 80% of its oneway space, and current process is responsible for either more than 50 transactions, or more than 50% of the oneway space). And the detection will restart when the async buffer has returned to a healthy state. Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Hang Lu <hangl@codeaurora.org> Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10binder: fix the missing BR_FROZEN_REPLY in binder_return_stringsHang Lu2-2/+3
Add BR_FROZEN_REPLY in binder_return_strings to support stat function. Fixes: ae28c1be1e54 ("binder: BINDER_GET_FROZEN_INFO ioctl") Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Hang Lu <hangl@codeaurora.org> Link: https://lore.kernel.org/r/1617961246-4502-2-git-send-email-hangl@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-09bus: mhi: fix typo in comments for struct mhi_channel_configJarvis Jiang1-1/+1
The word 'rung' is a typo in below comment, fix it. * @event_ring: The event rung index that services this channel Signed-off-by: Jarvis Jiang <jarvis.w.jiang@gmail.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20210408100220.3853-1-jarvis.w.jiang@gmail.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2021-04-09habanalabs: print f/w boot unknown errorOded Gabbay1-16/+68
We need to print a message to the kernel log in case we encounter an unknown error in the f/w boot to help the user understand what happened. In addition, we shouldn't print unknown error in case of known errors. Moreover, in case of warnings/info, we shouldn't return -EIO that will fail the initialization and mark the device as disabled Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: update to latest F/W communication headerOhad Sharabi2-1/+200
update files to latest version from F/W team. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: skip iATU if F/W security is enabledOfir Bitton4-1/+101
As part of the securing GAUDI, the F/W will configure the PCI iATU regions. If the driver identifies a secured PCI ID, it will know to skip iATU configuration in a very early stage. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: derive security status from pci idOfir Bitton8-8/+35
As F/ security indication must be available before driver approaches PCI bus, F/W security should be derived from PCI id rather than be fetched during boot handshake with F/W. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: move dram scrub to free sequenceBharat Jauhari1-39/+48
DRAM scrubbing can take time hence it adds to latency during allocation. To minimize latency during initialization, scrubbing is moved to release call. In case scrubbing fails it means the device is in a bad state, hence HARD reset is initiated. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: send dynamic msi-x indexes to f/wOhad Sharabi5-19/+131
In order to minimize hard coded values between F/W and the driver, we send msi-x indexes dynamically to the F/W. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: clear QM errors only if not in stop_on_err modeTomer Tayar1-1/+2
Clearing QM errors by the driver will prevent these H/W blocks from stopping in case they are configured to stop on errors, so perform this clearing only if this mode is not in use. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: support DEVICE_UNUSABLE error indication from FWKoby Elbaz2-0/+7
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit. On boot-up, the driver will therefore fail inserting the device. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: use strscpy instead of sprintf and strlcpyOded Gabbay1-2/+2
Prefer the use of strscpy when copying the ASIC name into a char array, to prevent accidentally exceeding the array's length. In addition, strlcpy is frowned upon so replace it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: remove the store jobs array from CS IOCTLOded Gabbay1-25/+10
The store part was never implemented in the code and never been used by the userspace applications. We currently use the related parameters to a different purpose with a defined union. However, there is no point in that and it is better to just remove the union and the store parameters. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: add debugfs to DMA from the deviceOded Gabbay5-20/+351
When trying to debug program, the user often needs to dump large parts of the device's DRAM, which can reach to tens of GBs. Because reading from the device's internal memory through the PCI BAR is extremely slow, the debug can take hours. Instead, we can provide the user to copy data through one of the DMA engines. This will make the operation much faster. Currently, only GAUDI is supported. In GAUDI, we need to find a PCI DMA engine that is IDLE and set the DMA as secured to be able to bypass our MMU as we currently don't map the temporary buffer to the MMU. Example bash one-line to dump entire HBM to file (~2 minutes): for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \ printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \ echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \ sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: sync stream add protection to SOB reset flowfarah kassabri1-4/+12
Since we moved the SOB reset flow to workqueue and not part of the fence release flow, we might reach a scenario where new context is created while we in the middle of resetting the SOB. in such cases the reset may fail due to idle check. This will mess up the streams sync since the SOB value is invalid. so we protect this area with a mutex, to delay context creation. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: add custom timeout flag per csAlon Mizrahi4-18/+36
There is a need to allow to user to send command submissions with custom timeout as some CS take longer than the max timeout that is used by default. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: improve utilization calculationKoby Elbaz9-169/+40
The new approach is based on the notion that the relative current power consumption is in relation of proportionality to device's true utilization. Utilization info ranges between [0,100]% Currently, dc_power values are hard-coded. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: support legacy and new pll indexesOhad Sharabi9-36/+182
In order to use minimum of hard coded values common to LKD and F/W a dynamic method to work with PLLs is introduced in this patch. Formerly asic specific PLL numbering is now common for all asics. To be backward compatible a bit in dev status is defined, if the bit is not set LKD will keep working with old PLL numbering. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: move relevant datapath work outside cs lockOfir Bitton3-35/+68
In order to shorten the time cs lock is being held, we move any possible work outside of the cs lock. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: avoid soft lockup bug upon mapping errorfarah kassabri1-3/+17
Add a little sleep between page unmappings in case mapping of large number of host pages failed, in order to avoid soft lockup bug during the rollback. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: Update async events headerOfir Bitton3-14/+25
Update with latest version from the Firmware team. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: unsecure TPC cfg status registersOfir Bitton1-8/+0
Unsecure relevant registers as TPC engine need access to TPC status. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: always use single-msi modeOded Gabbay1-2/+1
The device can get into deadlock in case it use indirect mode for MSI interrupts (multi-msi) and have hard-reset during interrupt storm. To prevent that, always use direct mode which means single-msi mode. The F/W will prevent the host from writing to the indirect MSI registers to prevent any malicious user from causing this scenario. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: reset device upon BMC requestOfir Bitton3-1/+6
In case the BMC of the devices' box wants to initiate a reset of a specific device, it must go through driver. Once driver will receive the request it will initiate a hard reset flow. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: debugfs access to user mapped host addressesOfir Bitton5-41/+144
In order to have a better debuggability we allow debugfs access to user mmu mapped host memory. Non-user host memory access will be rejected. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: Switch to using the new API kobj_to_dev()Yang Li1-1/+1
fixed the following coccicheck: ./drivers/misc/habanalabs/common/sysfs.c:347:60-61: WARNING opportunity for kobj_to_dev() Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: update hl_boot_if.hOhad Sharabi1-0/+11
Update to the latest version of the file as supplied by the F/W. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: skip DISABLE PCI packet to FW on heartbeatOhad Sharabi7-44/+59
if reset is due to heartbeat, device CPU is no responsive in which case no point sending PCI disable message to it. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: replace GFP_ATOMIC with GFP_KERNELOfir Bitton6-14/+36
As there are incorrect assumptions in which some of the initialization and data path flows cannot sleep, most allocations are being done using GFP_ATOMIC. We modify the code to use GFP_ATOMIC only when realy needed, as sleepable flow should use GFP_KERNEL. In addition add a fallback to allocate memory using GFP_KERNEL, once ATOMIC allocation fails. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs/gaudi: update extended async event headerOfir Bitton1-5/+5
Update to the latest definition of the firmware Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: return current power via INFO IOCTLSagiv Ozeri5-0/+60
Add driver implementation for reading the current power from the device CPU F/W. Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: support HW blocks vm showSagiv Ozeri5-8/+113
Improve "vm" debugfs node to print also the virtual addresses which are currently mapped to HW blocks in the device. Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: use a single FW loading bringup flagOfir Bitton5-11/+17
For simplicity, use a single bringup flag indicating which FW binaries should loaded to device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: use correct define for 32-bit max valueOded Gabbay1-1/+1
Timeout in wait for interrupt is in 32-bit variable so we need to use the correct maximum value to compare. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: wait for interrupt supportOfir Bitton6-29/+322
In order to support command submissions from user space, the driver need to add support for user interrupt completions. The driver will allow multiple user threads to wait for an interrupt and perform a comparison with a given user address once interrupt expires. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: enable all IRQs for user interrupt supportOfir Bitton3-2/+74
In order to support user interrupts, driver must enable all MSI-X interrupts for any case user will trigger them. We differentiate between a valid user interrupt and a non valid one. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: reset device in case of sync errorOhad Sharabi5-0/+52
As the F/wW is the first to detect out of sync event, a new event is added to notify the driver on such event. In which case the driver performs hard reset. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: change default CS timeout to 30 secondsOded Gabbay1-2/+2
Because our graph contains network operations, we need to account for delay in the network. 5 seconds timeout per CS is not enough to account for that. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: print if device is used on FD closeOded Gabbay2-4/+6
Notify to the user that although he closed the FD, the device is still in use because there are live CS and/or memory mappings (mmaps). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: reset_upon_device_release is for bring-upOded Gabbay1-2/+1
Move the field to correct location in structure and remove comment. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: fail reset if device is not idleOded Gabbay1-14/+12
After any reset (soft or hard) the device (the engines/QMANs) should be idle. If they are not idle, fail the reset. If it is soft-reset, the driver will try to do hard-reset automatically. If it is hard-reset, the driver will make the device non-operational. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: reset after device is actually releasedOded Gabbay1-16/+16
The device is actually released only after the refcnt of the hpriv structure is 0, which means all its contexts were closed. If we reset the device while a context is still open, there are possibilities for unexpected behavior and crashes. For example, if the process has a mapping of a register block that is now currently being reset, and the process writes/reads to that block during the reset, the device can get stuck. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09habanalabs: add reset support when user closes FDOfir Bitton2-2/+20
In order to support command submissions that are done directly from user space, the driver must perform soft reset once user closes its FD. In case the soft reset fails or device is not idle, a hard reset should be performed. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>