<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/s390/crypto/ap_queue.c, branch v5.15.208</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.208</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.208'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-11-18T18:17:17+00:00</updated>
<entry>
<title>s390/ap: Fix hanging ioctl caused by orphaned replies</title>
<updated>2021-11-18T18:17:17+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-10-14T07:58:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ef2272417f49edcecd349151345378c24ad7b04'/>
<id>urn:sha1:3ef2272417f49edcecd349151345378c24ad7b04</id>
<content type='text'>
commit 3826350e6dd435e244eb6e47abad5a47c169ebc2 upstream.

When a queue is switched to soft offline during heavy load and later
switched to soft online again and now used, it may be that the caller
is blocked forever in the ioctl call.

The failure occurs because there is a pending reply after the queue(s)
have been switched to offline. This orphaned reply is received when
the queue is switched to online and is accidentally counted for the
outstanding replies. So when there was a valid outstanding reply and
this orphaned reply is received it counts as the outstanding one thus
dropping the outstanding counter to 0. Voila, with this counter the
receive function is not called any more and the real outstanding reply
is never received (until another request comes in...) and the ioctl
blocks.

The fix is simple. However, instead of readjusting the counter when an
orphaned reply is detected, I check the queue status for not empty and
compare this to the outstanding counter. So if the queue is not empty
then the counter must not drop to 0 but at least have a value of 1.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>s390/ap: fix kernel doc comments</title>
<updated>2021-09-15T12:29:21+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>hca@linux.ibm.com</email>
</author>
<published>2021-09-07T05:27:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=948e50551b9a07c3ab6b9d208bb8f16fa1b2ad41'/>
<id>urn:sha1:948e50551b9a07c3ab6b9d208bb8f16fa1b2ad41</id>
<content type='text'>
Get rid of warnings like:

drivers/s390/crypto/ap_bus.c:216: warning:
 bad line:
drivers/s390/crypto/ap_bus.c:444:
 warning: Function parameter or member 'floating' not described in 'ap_interrupt_handler'

Reviewed-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: fix state machine hang after failure to enable irq</title>
<updated>2021-08-26T18:22:12+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-08-25T08:55:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cabebb697c98fb1f05cc950a747a9b6ec61a5b01'/>
<id>urn:sha1:cabebb697c98fb1f05cc950a747a9b6ec61a5b01</id>
<content type='text'>
If for any reason the interrupt enable for an ap queue fails the
state machine run for the queue returned wrong return codes to the
caller. So the caller assumed interrupt support for this queue in
enabled and thus did not re-establish the high resolution timer used
for polling. In the end this let to a hang for the user space process
waiting "forever" for the reply.

This patch reworks these return codes to return correct indications
for the caller to re-establish the timer when a queue runs without
interrupt support.

Please note that this is fixing a wrong behavior after a first
failure (enable interrupt support for the queue) failed. However,
looks like this occasionally happens on KVM systems.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: Rework ap_dqap to deal with messages greater than recv buffer</title>
<updated>2021-07-08T13:37:27+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-06-30T14:10:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1f0d22defd59f603d63ba51483eeb8d72726ce8b'/>
<id>urn:sha1:1f0d22defd59f603d63ba51483eeb8d72726ce8b</id>
<content type='text'>
Rework of the ap_dqap() inline function with the dqap inline assembler
invocation and the caller code in ap_queue.c to be able to handle
replies which exceed the receive buffer size.

ap_dqap() now provides two additional parameters to handle together
with the caller the case where a reply in the firmware queue entry
exceeds the given message buffer size. It depends on the caller how to
exactly handle this. The behavior implemented now by ap_sm_recv() in
ap_queue.c is to simple purge this entry from the firmware queue and
let the caller 'receive' a -EMSGSIZE for the request without
delivering any reply data - not even a truncated reply message.

However, the reworked ap_dqap() could now get invoked in a way that
the message is received in multiple parts and the caller assembles the
parts into one reply message.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Suggested-by: Juergen Christ &lt;jchrist@linux.ibm.com&gt;
Reviewed-by: Juergen Christ &lt;jchrist@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/AP: support new dynamic AP bus size limit</title>
<updated>2021-07-05T10:44:23+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-06-25T10:29:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bd39654a2282c1a51c044575a6bc00d641d5dfd1'/>
<id>urn:sha1:bd39654a2282c1a51c044575a6bc00d641d5dfd1</id>
<content type='text'>
This patch provides support for new dynamic AP bus message limit
with the existing zcrypt device driver and AP bus core code.

There is support for a new field 'ml' from TAPQ query. The field
gives if != 0 the AP bus limit for this card in 4k chunk units.
The actual message size limit per card is shown as a new read-only
sysfs attribute. The sysfs attribute

  /sys/devices/ap/cardxx/max_msg_size

shows the upper limit in bytes used by the AP bus and zcrypt device
driver for requests and replies send to and received from this card.
Currently up to CEX7 support only max 12kB msg size and thus the field
shows 12288 meaning the upper limit of a valid msg for this card is
12kB. Please note that the usable payload is somewhat lower and
depends on the msg type and thus the header struct which is to be
prepended by the zcrypt dd.

The dispatcher responsible for choosing the right card and queue is
aware of the individual card AP bus message limit. So a request is
only assigned to a queue of a card which is able to handle the size of
the request (e.g. a 14kB request will never go to a max 12kB card).
If no such card is found the ioctl will fail with ENODEV.

The reply buffer held by the device driver is determined by the ml
field of the TAPQ for this card. If a response from the card exceeds
this limit however, the response is not truncated but the ioctl for
this request will fail with errno EMSGSIZE to indicate that the device
driver has dropped the response because it would overflow the buffer
limit.

If the request size does not indicate to the dispatcher that an
adapter with extended limit is to be used, a random card will be
chosen when no specific card is addressed (ANY addressing). This may
result in an ioctl failure when the reply size needs an adapter with
extended limit but the randomly chosen one is not capable of handling
the broader reply size. The user space application needs to use
dedicated addressing to forward such a request only to suitable cards
to get requests like this processed properly.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Reviewed-by: Ingo Tuchscherer &lt;ingo.tuchscherer@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: Fix hanging ioctl caused by wrong msg counter</title>
<updated>2021-06-16T21:32:02+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2021-06-01T06:27:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e73a99f3287a740a07d6618e9470f4d6cb217da8'/>
<id>urn:sha1:e73a99f3287a740a07d6618e9470f4d6cb217da8</id>
<content type='text'>
When a AP queue is switched to soft offline, all pending
requests are purged out of the pending requests list and
'received' by the upper layer like zcrypt device drivers.
This is also done for requests which are already enqueued
into the firmware queue. A request in a firmware queue
may eventually produce an response message, but there is
no waiting process any more. However, the response was
counted with the queue_counter and as this counter was
reset to 0 with the offline switch, the pending response
caused the queue_counter to get negative. The next request
increased this counter to 0 (instead of 1) which caused
the ap code to assume there is nothing to receive and so
the response for this valid request was never tried to
fetch from the firmware queue.

This all caused a queue to not work properly after a
switch offline/online and in the end processes to hang
forever when trying to send a crypto request after an
queue offline/online switch cicle.

Fixed by a) making sure the counter does not drop below 0
and b) on a successful enqueue of a message has at least
a value of 1.

Additionally a warning is emitted, when a reply can't get
assigned to a waiting process. This may be normal operation
(process had timeout or has been killed) but may give a
hint that something unexpected happened (like this odd
behavior described above).

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/zcrypt: fix wrong format specifications</title>
<updated>2020-10-09T21:45:30+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-10-08T06:43:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4366dd7251259806e57251cb2d699f0863841775'/>
<id>urn:sha1:4366dd7251259806e57251cb2d699f0863841775</id>
<content type='text'>
Fixes 5 wrong format specification findings found by the
kernel test robot in ap_queue.c:

warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
                               __func__, status.response_code,

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Fixes: 2ea2a6099ae3 ("s390/ap: add error response code field for ap queue devices")
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/zcrypt: Introduce Failure Injection feature</title>
<updated>2020-10-07T19:50:01+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-09-29T14:07:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=27c4f6738bdc535e42dfc1295dadc78ab7582939'/>
<id>urn:sha1:27c4f6738bdc535e42dfc1295dadc78ab7582939</id>
<content type='text'>
Introduce a way to specify additional debug flags with an crpyto
request to be able to trigger certain failures within the zcrypt
device drivers and/or ap core code.

This failure injection possibility is only enabled with a kernel debug
build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular
kernel running in production environment.

Details:

* The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the
  leftmost bit of the 32 bit unsigned int inputdatalength field is
  set, the uppermost 16 bits are separated and used as debug flag
  value. The process is checked to have the CAP_SYS_ADMIN capability
  enabled or EPERM is returned.

* The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the
  leftmost bit of the 32 bit unsigned int inputdatalength field is set,
  the uppermost 16 bits are separated and used als debug flag
  value. The process is checked to have the CAP_SYS_ADMIN capability
  enabled or EPERM is returned.

* The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct
  ica_xcRB. If the leftmost bit of the 32 bit unsigned int status
  field is set, the uppermost 16 bits of this field are used as debug
  flag value. The process is checked to have the CAP_SYS_ADMIN
  capability enabled or EPERM is returned.

* The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct
  ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len
  field is set, the uppermost 16 bits of this field are used as debug
  flag value. The process is checked to have the CAP_SYS_ADMIN
  capability enabled or EPERM is returned.

So it is possible to send an additional 16 bit value to the zcrypt API
to be used to carry a failure injection command which may trigger
special behavior within the zcrypt API and layers below. This 16 bit
value is for the rest of the test referred as 'fi command' for Failure
Injection.

The lower 8 bits of the fi command construct a numerical argument in
the range of 1-255 and is the 'fi action' to be performed with the
request or the resulting reply:

* 0x00 (all requests): No failure injection action but flags may be
  provided which may affect the processing of the request or reply.
* 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to
  'FF'. This results in an reply code 0x90 (Transport-Protocol
  Failure).
* 0x02 (only CCA CPRBs): After the APQN to send to has been chosen,
  the domain field within the CPRB is overwritten with value 99 to
  enforce an reply with RY 0x8A.
* 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00
  is used causing an response code of 0x01 (AP queue not valid).

The upper 8 bits of the fi command may carry bit flags which may
influence the processing of an request or response:

* 0x01: No retry. If this bit is set, the usual loop in the zcrypt API
  which retries an CPRB up to 10 times when the lower layers return
  with EAGAIN is abandoned after the first attempt to send the CPRB.
* 0x02: Toggle special. Toggles the special bit on this request. This
  should result in an reply code RY~0x41 and result in an ioctl
  failure with errno EINVAL.

This failure injection possibilities may get some further extensions
in the future. As of now this is a starting point for Continuous Test
and Integration to trigger some failures and watch for the reaction of
the ap bus and zcrypt device driver code.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap/zcrypt: revisit ap and zcrypt error handling</title>
<updated>2020-10-07T19:50:01+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-08-04T07:27:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0332629e33d1926c93348d918aaaf451ef9a16b'/>
<id>urn:sha1:e0332629e33d1926c93348d918aaaf451ef9a16b</id>
<content type='text'>
Revisit the ap queue error handling: Based on discussions and
evaluatios with the firmware folk here is now a rework of the response
code handling for all the AP instructions. The idea is to distinguish
between failures because of some kind of invalid request where a retry
does not make any sense and a failure where another attempt to send
the very same request may succeed. The first case is handled by
returning EINVAL to the userspace application. The second case results
in retries within the zcrypt API controlled by a per message retry
counter.

Revisit the zcrpyt error handling: Similar here, based on discussions
with the firmware people here comes a rework of the handling of all
the reply codes.  Main point here is that there are only very few
cases left, where a zcrypt device queue is switched to offline. It
should never be the case that an AP reply message is 'unknown' to the
device driver as it indicates a total mismatch between device driver
and crypto card firmware. In all other cases, the code distinguishes
between failure because of invalid message (see above - EINVAL) or
failures of the infrastructure (see above - EAGAIN).

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/ap: add card/queue deconfig state</title>
<updated>2020-10-07T19:50:00+00:00</updated>
<author>
<name>Harald Freudenberger</name>
<email>freude@linux.ibm.com</email>
</author>
<published>2020-07-02T14:57:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f2fcccdb547b09a4532c705078811e672fb9235'/>
<id>urn:sha1:4f2fcccdb547b09a4532c705078811e672fb9235</id>
<content type='text'>
This patch adds a new config state to the ap card and queue
devices. This state reflects the response code
0x03 "AP deconfigured" on TQAP invocation and is tracked with
every ap bus scan.

Together with this new state now a card/queue device which
is 'deconfigured' is not disposed any more. However, for backward
compatibility the online state now needs to take this state into
account. So a card/queue is offline when the device is not configured.
Furthermore a device can't get switched from offline to online state
when not configured.

The config state is shown in sysfs at
  /sys/devices/ap/cardxx/config
for the card and
  /sys/devices/ap/cardxx/xx.yyyy/config
for each queue within each card.
It is a read-only attribute reflecting the negation of the
'AP deconfig' state as it is noted in the AP documents.

Signed-off-by: Harald Freudenberger &lt;freude@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
</feed>
