<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/s390/crypto/ap_queue.c, branch v5.10.257</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-11-18T13:04:30+00:00</updated>
<entry>
<title>s390/ap: Fix hanging ioctl caused by orphaned replies</title>
<updated>2021-11-18T13:04:30+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-10-14T07:58:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c9ca9669dec322335da6f6ce0c6da7c48391f032'/>
<id>urn:sha1:c9ca9669dec322335da6f6ce0c6da7c48391f032</id>
<content type='text'>
commit 3826350e6dd435e244eb6e47abad5a47c169ebc2 upstream.

When a queue is switched to soft offline during heavy load and later
switched to soft online again and now used, it may be that the caller
is blocked forever in the ioctl call.

The failure occurs because there is a pending reply after the queue(s)
have been switched to offline. This orphaned reply is received when
the queue is switched to online and is accidentally counted for the
outstanding replies. So when there was a valid outstanding reply and
this orphaned reply is received it counts as the outstanding one thus
dropping the outstanding counter to 0. Voila, with this counter the
receive function is not called any more and the real outstanding reply
is never received (until another request comes in...) and the ioctl
blocks.

The fix is simple. However, instead of readjusting the counter when an
orphaned reply is detected, I check the queue status for not empty and
compare this to the outstanding counter. So if the queue is not empty
then the counter must not drop to 0 but at least have a value of 1.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>s390/ap: fix state machine hang after failure to enable irq</title>
<updated>2021-09-15T07:50:27+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-08-25T08:55:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a758b1d4ca207a5f03a69a210a8b90d86fdd93a1'/>
<id>urn:sha1:a758b1d4ca207a5f03a69a210a8b90d86fdd93a1</id>
<content type='text'>
[ Upstream commit cabebb697c98fb1f05cc950a747a9b6ec61a5b01 ]

If for any reason the interrupt enable for an ap queue fails the
state machine run for the queue returned wrong return codes to the
caller. So the caller assumed interrupt support for this queue in
enabled and thus did not re-establish the high resolution timer used
for polling. In the end this let to a hang for the user space process
waiting "forever" for the reply.

This patch reworks these return codes to return correct indications
for the caller to re-establish the timer when a queue runs without
interrupt support.

Please note that this is fixing a wrong behavior after a first
failure (enable interrupt support for the queue) failed. However,
looks like this occasionally happens on KVM systems.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>s390/ap: Fix hanging ioctl caused by wrong msg counter</title>
<updated>2021-06-23T12:42:51+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-06-01T06:27:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b516daed991359ba0433dff31862ae4df30b4150'/>
<id>urn:sha1:b516daed991359ba0433dff31862ae4df30b4150</id>
<content type='text'>
commit e73a99f3287a740a07d6618e9470f4d6cb217da8 upstream.

When a AP queue is switched to soft offline, all pending
requests are purged out of the pending requests list and
'received' by the upper layer like zcrypt device drivers.
This is also done for requests which are already enqueued
into the firmware queue. A request in a firmware queue
may eventually produce an response message, but there is
no waiting process any more. However, the response was
counted with the queue_counter and as this counter was
reset to 0 with the offline switch, the pending response
caused the queue_counter to get negative. The next request
increased this counter to 0 (instead of 1) which caused
the ap code to assume there is nothing to receive and so
the response for this valid request was never tried to
fetch from the firmware queue.

This all caused a queue to not work properly after a
switch offline/online and in the end processes to hang
forever when trying to send a crypto request after an
queue offline/online switch cicle.

Fixed by a) making sure the counter does not drop below 0
and b) on a successful enqueue of a message has at least
a value of 1.

Additionally a warning is emitted, when a reply can't get
assigned to a waiting process. This may be normal operation
(process had timeout or has been killed) but may give a
hint that something unexpected happened (like this odd
behavior described above).

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>s390/zcrypt: fix wrong format specifications</title>
<updated>2020-10-09T21:45:30+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-10-08T06:43:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4366dd7251259806e57251cb2d699f0863841775'/>
<id>urn:sha1:4366dd7251259806e57251cb2d699f0863841775</id>
<content type='text'>
Fixes 5 wrong format specification findings found by the
kernel test robot in ap_queue.c:

warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
                               __func__, status.response_code,

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Fixes: 2ea2a6099ae3 ("s390/ap: add error response code field for ap queue devices")
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/zcrypt: Introduce Failure Injection feature</title>
<updated>2020-10-07T19:50:01+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-09-29T14:07:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=27c4f6738bdc535e42dfc1295dadc78ab7582939'/>
<id>urn:sha1:27c4f6738bdc535e42dfc1295dadc78ab7582939</id>
<content type='text'>
Introduce a way to specify additional debug flags with an crpyto
request to be able to trigger certain failures within the zcrypt
device drivers and/or ap core code.

This failure injection possibility is only enabled with a kernel debug
build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular
kernel running in production environment.

Details:

* The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the
  leftmost bit of the 32 bit unsigned int inputdatalength field is
  set, the uppermost 16 bits are separated and used as debug flag
  value. The process is checked to have the CAP_SYS_ADMIN capability
  enabled or EPERM is returned.

* The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the
  leftmost bit of the 32 bit unsigned int inputdatalength field is set,
  the uppermost 16 bits are separated and used als debug flag
  value. The process is checked to have the CAP_SYS_ADMIN capability
  enabled or EPERM is returned.

* The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct
  ica_xcRB. If the leftmost bit of the 32 bit unsigned int status
  field is set, the uppermost 16 bits of this field are used as debug
  flag value. The process is checked to have the CAP_SYS_ADMIN
  capability enabled or EPERM is returned.

* The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct
  ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len
  field is set, the uppermost 16 bits of this field are used as debug
  flag value. The process is checked to have the CAP_SYS_ADMIN
  capability enabled or EPERM is returned.

So it is possible to send an additional 16 bit value to the zcrypt API
to be used to carry a failure injection command which may trigger
special behavior within the zcrypt API and layers below. This 16 bit
value is for the rest of the test referred as 'fi command' for Failure
Injection.

The lower 8 bits of the fi command construct a numerical argument in
the range of 1-255 and is the 'fi action' to be performed with the
request or the resulting reply:

* 0x00 (all requests): No failure injection action but flags may be
  provided which may affect the processing of the request or reply.
* 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to
  'FF'. This results in an reply code 0x90 (Transport-Protocol
  Failure).
* 0x02 (only CCA CPRBs): After the APQN to send to has been chosen,
  the domain field within the CPRB is overwritten with value 99 to
  enforce an reply with RY 0x8A.
* 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00
  is used causing an response code of 0x01 (AP queue not valid).

The upper 8 bits of the fi command may carry bit flags which may
influence the processing of an request or response:

* 0x01: No retry. If this bit is set, the usual loop in the zcrypt API
  which retries an CPRB up to 10 times when the lower layers return
  with EAGAIN is abandoned after the first attempt to send the CPRB.
* 0x02: Toggle special. Toggles the special bit on this request. This
  should result in an reply code RY~0x41 and result in an ioctl
  failure with errno EINVAL.

This failure injection possibilities may get some further extensions
in the future. As of now this is a starting point for Continuous Test
and Integration to trigger some failures and watch for the reaction of
the ap bus and zcrypt device driver code.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap/zcrypt: revisit ap and zcrypt error handling</title>
<updated>2020-10-07T19:50:01+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-08-04T07:27:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0332629e33d1926c93348d918aaaf451ef9a16b'/>
<id>urn:sha1:e0332629e33d1926c93348d918aaaf451ef9a16b</id>
<content type='text'>
Revisit the ap queue error handling: Based on discussions and
evaluatios with the firmware folk here is now a rework of the response
code handling for all the AP instructions. The idea is to distinguish
between failures because of some kind of invalid request where a retry
does not make any sense and a failure where another attempt to send
the very same request may succeed. The first case is handled by
returning EINVAL to the userspace application. The second case results
in retries within the zcrypt API controlled by a per message retry
counter.

Revisit the zcrpyt error handling: Similar here, based on discussions
with the firmware people here comes a rework of the handling of all
the reply codes.  Main point here is that there are only very few
cases left, where a zcrypt device queue is switched to offline. It
should never be the case that an AP reply message is 'unknown' to the
device driver as it indicates a total mismatch between device driver
and crypto card firmware. In all other cases, the code distinguishes
between failure because of invalid message (see above - EINVAL) or
failures of the infrastructure (see above - EAGAIN).

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: add card/queue deconfig state</title>
<updated>2020-10-07T19:50:00+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-07-02T14:57:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f2fcccdb547b09a4532c705078811e672fb9235'/>
<id>urn:sha1:4f2fcccdb547b09a4532c705078811e672fb9235</id>
<content type='text'>
This patch adds a new config state to the ap card and queue
devices. This state reflects the response code
0x03 "AP deconfigured" on TQAP invocation and is tracked with
every ap bus scan.

Together with this new state now a card/queue device which
is 'deconfigured' is not disposed any more. However, for backward
compatibility the online state now needs to take this state into
account. So a card/queue is offline when the device is not configured.
Furthermore a device can't get switched from offline to online state
when not configured.

The config state is shown in sysfs at
  /sys/devices/ap/cardxx/config
for the card and
  /sys/devices/ap/cardxx/xx.yyyy/config
for each queue within each card.
It is a read-only attribute reflecting the negation of the
'AP deconfig' state as it is noted in the AP documents.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: add error response code field for ap queue devices</title>
<updated>2020-10-07T19:50:00+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-07-02T13:56:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2ea2a6099ae3d1708f90f43c81a98cba3d4bb74c'/>
<id>urn:sha1:2ea2a6099ae3d1708f90f43c81a98cba3d4bb74c</id>
<content type='text'>
On AP instruction failures the last response code is now
kept in the struct ap_queue. There is also a new sysfs
attribute showing this field (enabled only on debug kernels).

Also slight rework of the AP_DBF macros to get some more
content into one debug feature message line.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: split ap queue state machine state from device state</title>
<updated>2020-10-07T19:49:59+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-07-02T09:22:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b641cbd24445e56073c69dd046be488dcf1965b'/>
<id>urn:sha1:0b641cbd24445e56073c69dd046be488dcf1965b</id>
<content type='text'>
The state machine for each ap queue covered a mixture of
device states and state machine (firmware queue state) states.

This patch splits the device states and the state machine
states into two different enums and variables. The major
state is the device state with currently these values:

  AP_DEV_STATE_UNINITIATED - fresh and virgin, not touched
  AP_DEV_STATE_OPERATING   - queue dev is working normal
  AP_DEV_STATE_SHUTDOWN	   - remove/unbind/shutdown in progress
  AP_DEV_STATE_ERROR	   - device is in error state

only when the device state is &gt; UNINITIATED the state machine
is run. The state machine represents the states of the firmware
queue:

  AP_SM_STATE_RESET_START - starting point, reset (RAPQ) ap queue
  AP_SM_STATE_RESET_WAIT  - reset triggered, waiting to be finished
			    if irqs enabled, set up irq (AQIC)
  AP_SM_STATE_SETIRQ_WAIT - enable irq triggered, waiting to be
			    finished, then go to IDLE
  AP_SM_STATE_IDLE	  - queue is operational but empty
  AP_SM_STATE_WORKING	  - queue is operational, requests are stored
			    and replies may wait for getting fetched
  AP_SM_STATE_QUEUE_FULL  - firmware queue is full, so only replies
			    can get fetched

For debugging each ap queue shows a sysfs attribute 'states' which
displays the device and state machine state and is only available
when the kernel is build with CONFIG_ZCRYPT_DEBUG enabled.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: rename and clarify ap state machine related stuff</title>
<updated>2020-07-03T08:49:49+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-05-26T08:49:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc4b6ded3c17ebe1d7532943192b2308c031c43b'/>
<id>urn:sha1:dc4b6ded3c17ebe1d7532943192b2308c031c43b</id>
<content type='text'>
There is a state machine held for each ap queue device.
The states and functions related to this where somethimes
noted with _sm_ somethimes without. This patch clarifies
and renames all the ap queue state machine related functions,
enums and defines to have a _sm_ in the name.

There is no functional change coming with this patch - it's
only beautifying code.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</content>
</entry>
</feed>
