kernel/linux.git/drivers/s390/crypto/ap_queue.c, branch v5.10.257

s390/ap: Fix hanging ioctl caused by orphaned replies

2021-11-18T13:04:30+00:00

commit 3826350e6dd435e244eb6e47abad5a47c169ebc2 upstream. When a queue is switched to soft offline during heavy load and later switched to soft online again and now used, it may be that the caller is blocked forever in the ioctl call. The failure occurs because there is a pending reply after the queue(s) have been switched to offline. This orphaned reply is received when the queue is switched to online and is accidentally counted for the outstanding replies. So when there was a valid outstanding reply and this orphaned reply is received it counts as the outstanding one thus dropping the outstanding counter to 0. Voila, with this counter the receive function is not called any more and the real outstanding reply is never received (until another request comes in...) and the ioctl blocks. The fix is simple. However, instead of readjusting the counter when an orphaned reply is detected, I check the queue status for not empty and compare this to the outstanding counter. So if the queue is not empty then the counter must not drop to 0 but at least have a value of 1. Signed-off-by: Harald Freudenberger Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik Signed-off-by: Greg Kroah-Hartman

s390/ap: fix state machine hang after failure to enable irq

2021-09-15T07:50:27+00:00

[ Upstream commit cabebb697c98fb1f05cc950a747a9b6ec61a5b01 ] If for any reason the interrupt enable for an ap queue fails the state machine run for the queue returned wrong return codes to the caller. So the caller assumed interrupt support for this queue in enabled and thus did not re-establish the high resolution timer used for polling. In the end this let to a hang for the user space process waiting "forever" for the reply. This patch reworks these return codes to return correct indications for the caller to re-establish the timer when a queue runs without interrupt support. Please note that this is fixing a wrong behavior after a first failure (enable interrupt support for the queue) failed. However, looks like this occasionally happens on KVM systems. Signed-off-by: Harald Freudenberger Signed-off-by: Heiko Carstens Signed-off-by: Sasha Levin

s390/ap: Fix hanging ioctl caused by wrong msg counter

2021-06-23T12:42:51+00:00

commit e73a99f3287a740a07d6618e9470f4d6cb217da8 upstream. When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik Signed-off-by: Greg Kroah-Hartman

s390/zcrypt: fix wrong format specifications

2020-10-09T21:45:30+00:00

Fixes 5 wrong format specification findings found by the kernel test robot in ap_queue.c: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] __func__, status.response_code, Signed-off-by: Harald Freudenberger Reported-by: kernel test robot Fixes: 2ea2a6099ae3 ("s390/ap: add error response code field for ap queue devices") Signed-off-by: Vasily Gorbik

s390/zcrypt: Introduce Failure Injection feature

2020-10-07T19:50:01+00:00

Introduce a way to specify additional debug flags with an crpyto request to be able to trigger certain failures within the zcrypt device drivers and/or ap core code. This failure injection possibility is only enabled with a kernel debug build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular kernel running in production environment. Details: * The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used als debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct ica_xcRB. If the leftmost bit of the 32 bit unsigned int status field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. So it is possible to send an additional 16 bit value to the zcrypt API to be used to carry a failure injection command which may trigger special behavior within the zcrypt API and layers below. This 16 bit value is for the rest of the test referred as 'fi command' for Failure Injection. The lower 8 bits of the fi command construct a numerical argument in the range of 1-255 and is the 'fi action' to be performed with the request or the resulting reply: * 0x00 (all requests): No failure injection action but flags may be provided which may affect the processing of the request or reply. * 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to 'FF'. This results in an reply code 0x90 (Transport-Protocol Failure). * 0x02 (only CCA CPRBs): After the APQN to send to has been chosen, the domain field within the CPRB is overwritten with value 99 to enforce an reply with RY 0x8A. * 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00 is used causing an response code of 0x01 (AP queue not valid). The upper 8 bits of the fi command may carry bit flags which may influence the processing of an request or response: * 0x01: No retry. If this bit is set, the usual loop in the zcrypt API which retries an CPRB up to 10 times when the lower layers return with EAGAIN is abandoned after the first attempt to send the CPRB. * 0x02: Toggle special. Toggles the special bit on this request. This should result in an reply code RY~0x41 and result in an ioctl failure with errno EINVAL. This failure injection possibilities may get some further extensions in the future. As of now this is a starting point for Continuous Test and Integration to trigger some failures and watch for the reaction of the ap bus and zcrypt device driver code. Signed-off-by: Harald Freudenberger Signed-off-by: Vasily Gorbik

s390/ap/zcrypt: revisit ap and zcrypt error handling

2020-10-07T19:50:01+00:00

Revisit the ap queue error handling: Based on discussions and evaluatios with the firmware folk here is now a rework of the response code handling for all the AP instructions. The idea is to distinguish between failures because of some kind of invalid request where a retry does not make any sense and a failure where another attempt to send the very same request may succeed. The first case is handled by returning EINVAL to the userspace application. The second case results in retries within the zcrypt API controlled by a per message retry counter. Revisit the zcrpyt error handling: Similar here, based on discussions with the firmware people here comes a rework of the handling of all the reply codes. Main point here is that there are only very few cases left, where a zcrypt device queue is switched to offline. It should never be the case that an AP reply message is 'unknown' to the device driver as it indicates a total mismatch between device driver and crypto card firmware. In all other cases, the code distinguishes between failure because of invalid message (see above - EINVAL) or failures of the infrastructure (see above - EAGAIN). Signed-off-by: Harald Freudenberger Signed-off-by: Vasily Gorbik

s390/ap: add card/queue deconfig state

2020-10-07T19:50:00+00:00

This patch adds a new config state to the ap card and queue devices. This state reflects the response code 0x03 "AP deconfigured" on TQAP invocation and is tracked with every ap bus scan. Together with this new state now a card/queue device which is 'deconfigured' is not disposed any more. However, for backward compatibility the online state now needs to take this state into account. So a card/queue is offline when the device is not configured. Furthermore a device can't get switched from offline to online state when not configured. The config state is shown in sysfs at /sys/devices/ap/cardxx/config for the card and /sys/devices/ap/cardxx/xx.yyyy/config for each queue within each card. It is a read-only attribute reflecting the negation of the 'AP deconfig' state as it is noted in the AP documents. Signed-off-by: Harald Freudenberger Signed-off-by: Vasily Gorbik

s390/ap: add error response code field for ap queue devices

2020-10-07T19:50:00+00:00

On AP instruction failures the last response code is now kept in the struct ap_queue. There is also a new sysfs attribute showing this field (enabled only on debug kernels). Also slight rework of the AP_DBF macros to get some more content into one debug feature message line. Signed-off-by: Harald Freudenberger Signed-off-by: Vasily Gorbik

s390/ap: split ap queue state machine state from device state

2020-10-07T19:49:59+00:00

The state machine for each ap queue covered a mixture of device states and state machine (firmware queue state) states. This patch splits the device states and the state machine states into two different enums and variables. The major state is the device state with currently these values: AP_DEV_STATE_UNINITIATED - fresh and virgin, not touched AP_DEV_STATE_OPERATING - queue dev is working normal AP_DEV_STATE_SHUTDOWN - remove/unbind/shutdown in progress AP_DEV_STATE_ERROR - device is in error state only when the device state is > UNINITIATED the state machine is run. The state machine represents the states of the firmware queue: AP_SM_STATE_RESET_START - starting point, reset (RAPQ) ap queue AP_SM_STATE_RESET_WAIT - reset triggered, waiting to be finished if irqs enabled, set up irq (AQIC) AP_SM_STATE_SETIRQ_WAIT - enable irq triggered, waiting to be finished, then go to IDLE AP_SM_STATE_IDLE - queue is operational but empty AP_SM_STATE_WORKING - queue is operational, requests are stored and replies may wait for getting fetched AP_SM_STATE_QUEUE_FULL - firmware queue is full, so only replies can get fetched For debugging each ap queue shows a sysfs attribute 'states' which displays the device and state machine state and is only available when the kernel is build with CONFIG_ZCRYPT_DEBUG enabled. Signed-off-by: Harald Freudenberger Signed-off-by: Vasily Gorbik

s390/ap: rename and clarify ap state machine related stuff

2020-07-03T08:49:49+00:00

There is a state machine held for each ap queue device. The states and functions related to this where somethimes noted with _sm_ somethimes without. This patch clarifies and renames all the ap queue state machine related functions, enums and defines to have a _sm_ in the name. There is no functional change coming with this patch - it's only beautifying code. Signed-off-by: Harald Freudenberger Signed-off-by: Heiko Carstens