<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/scsi/scsi_error.c, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-30T09:27:36+00:00</updated>
<entry>
<title>scsi: core: Wake up the error handler when final completions race against each other</title>
<updated>2026-01-30T09:27:36+00:00</updated>
<author>
<name>David Jeffery</name>
<email>djeffery@redhat.com</email>
</author>
<published>2026-01-13T16:08:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fdc6f28d5e81350ab1d2cac8389062bd09e61e1'/>
<id>urn:sha1:9fdc6f28d5e81350ab1d2cac8389062bd09e61e1</id>
<content type='text'>
[ Upstream commit fe2f8ad6f0999db3b318359a01ee0108c703a8c3 ]

The fragile ordering between marking commands completed or failed so
that the error handler only wakes when the last running command
completes or times out has race conditions. These race conditions can
cause the SCSI layer to fail to wake the error handler, leaving I/O
through the SCSI host stuck as the error state cannot advance.

First, there is an memory ordering issue within scsi_dec_host_busy().
The write which clears SCMD_STATE_INFLIGHT may be reordered with reads
counting in scsi_host_busy(). While the local CPU will see its own
write, reordering can allow other CPUs in scsi_dec_host_busy() or
scsi_eh_inc_host_failed() to see a raised busy count, causing no CPU to
see a host busy equal to the host_failed count.

This race condition can be prevented with a memory barrier on the error
path to force the write to be visible before counting host busy
commands.

Second, there is a general ordering issue with scsi_eh_inc_host_failed(). By
counting busy commands before incrementing host_failed, it can race with a
final command in scsi_dec_host_busy(), such that scsi_dec_host_busy() does
not see host_failed incremented but scsi_eh_inc_host_failed() counts busy
commands before SCMD_STATE_INFLIGHT is cleared by scsi_dec_host_busy(),
resulting in neither waking the error handler task.

This needs the call to scsi_host_busy() to be moved after host_failed is
incremented to close the race condition.

Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: David Jeffery &lt;djeffery@redhat.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://patch.msgid.link/20260113161036.6730-1-djeffery@redhat.com
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>scsi: core: Fix error handler encryption support</title>
<updated>2026-01-30T09:27:26+00:00</updated>
<author>
<name>Brian Kao</name>
<email>powenkao@google.com</email>
</author>
<published>2025-12-18T03:17:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8fab42516eb3cc37c0514f66e8ad91581bdef7a4'/>
<id>urn:sha1:8fab42516eb3cc37c0514f66e8ad91581bdef7a4</id>
<content type='text'>
commit 9a49157deeb23581fc5c8189b486340d7343264a upstream.

Some low-level drivers (LLD) access block layer crypto fields, such as
rq-&gt;crypt_keyslot and rq-&gt;crypt_ctx within `struct request`, to
configure hardware for inline encryption.  However, SCSI Error Handling
(EH) commands (e.g., TEST UNIT READY, START STOP UNIT) should not
involve any encryption setup.

To prevent drivers from erroneously applying crypto settings during EH,
this patch saves the original values of rq-&gt;crypt_keyslot and
rq-&gt;crypt_ctx before an EH command is prepared via scsi_eh_prep_cmnd().
These fields in the 'struct request' are then set to NULL.  The original
values are restored in scsi_eh_restore_cmnd() after the EH command
completes.

This ensures that the block layer crypto context does not leak into EH
command execution.

Signed-off-by: Brian Kao &lt;powenkao@google.com&gt;
Link: https://patch.msgid.link/20251218031726.2642834-1-powenkao@google.com
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>scsi: core: Move scsi_host_busy() out of host lock if it is for per-command</title>
<updated>2024-02-16T18:10:53+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-02-03T02:45:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aceb4ab9689bdda6cf27988ce0c107f4ef6a558d'/>
<id>urn:sha1:aceb4ab9689bdda6cf27988ce0c107f4ef6a558d</id>
<content type='text'>
[ Upstream commit 4e6c9011990726f4d175e2cdfebe5b0b8cce4839 ]

Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock
for waking up EH handler") intended to fix a hard lockup issue triggered by
EH. The core idea was to move scsi_host_busy() out of the host lock when
processing individual commands for EH. However, a suggested style change
inadvertently caused scsi_host_busy() to remain under the host lock. Fix
this by calling scsi_host_busy() outside the lock.

Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler")
Cc: Sathya Prakash Veerichetty &lt;safhya.prakash@broadcom.com&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Ewan D. Milne &lt;emilne@redhat.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler</title>
<updated>2024-02-05T20:14:34+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-01-12T07:00:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=65ead8468c21c2676d4d06f50b46beffdea69df1'/>
<id>urn:sha1:65ead8468c21c2676d4d06f50b46beffdea69df1</id>
<content type='text'>
[ Upstream commit 4373534a9850627a2695317944898eb1283a2db0 ]

Inside scsi_eh_wakeup(), scsi_host_busy() is called &amp; checked with host
lock every time for deciding if error handler kthread needs to be waken up.

This can be too heavy in case of recovery, such as:

 - N hardware queues

 - queue depth is M for each hardware queue

 - each scsi_host_busy() iterates over (N * M) tag/requests

If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M -
1) times, and request has been iterated for (N*M - 1) * (N * M) times.

If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).

Fix the issue by calling scsi_host_busy() outside the host lock. We don't
need the host lock for getting busy count because host the lock never
covers that.

[mkp: Drop unnecessary 'busy' variables pointed out by Bart]

Cc: Ewan Milne &lt;emilne@redhat.com&gt;
Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
Reviewed-by: Ewan D. Milne &lt;emilne@redhat.com&gt;
Reviewed-by: Sathya Prakash Veerichetty &lt;safhya.prakash@broadcom.com&gt;
Tested-by: Sathya Prakash Veerichetty &lt;safhya.prakash@broadcom.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>scsi: core: Kick the requeue list after inserting when flushing</title>
<updated>2024-02-01T00:18:48+00:00</updated>
<author>
<name>Niklas Cassel</name>
<email>cassel@kernel.org</email>
</author>
<published>2024-01-11T12:05:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b4b35d7cd63d2f3d28484cf3d38a0afd25c9719'/>
<id>urn:sha1:3b4b35d7cd63d2f3d28484cf3d38a0afd25c9719</id>
<content type='text'>
[ Upstream commit 6df0e077d76bd144c533b61d6182676aae6b0a85 ]

When libata calls ata_link_abort() to abort all ata queued commands, it
calls blk_abort_request() on the SCSI command representing each QC.

This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for
each SCSI command.

scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the
command to shost-&gt;eh_cmd_q.

This will wake up the SCSI EH, and eventually the libata EH strategy
handler will be called, which calls scsi_eh_flush_done_q() to either flush
retry or flush finish each failed command.

The commands that are flush retried by scsi_eh_flush_done_q() are done so
using scsi_queue_insert().

Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the
second argument set to true, indicating that it should always kick/run the
requeue list after inserting.

After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() does not kick/run the requeue list after
inserting, if the current SCSI host state is recovery (which is the case in
the libata example above).

This optimization is probably fine in most cases, as I can only assume that
most often someone will eventually kick/run the queues.

However, that is not the case for scsi_eh_flush_done_q(), where we can see
that the request gets inserted to the requeue list, but the queue is never
started after the request has been inserted, leading to the block layer
waiting for the completion of command that never gets to run.

Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
state is most likely always in recovery when this function is called.

Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the
same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the
requeue list if necessary").

Simple reproducer for the libata example above:
$ hdparm -Y /dev/sda
$ echo 1 &gt; /sys/class/scsi_device/0\:0\:0\:0/device/delete

Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Reported-by: Kevin Locke &lt;kevin@kevinlocke.name&gt;
Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
Signed-off-by: Niklas Cassel &lt;cassel@kernel.org&gt;
Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>scsi: core: Always send batch on reset or error handling command</title>
<updated>2024-01-01T12:42:45+00:00</updated>
<author>
<name>Alexander Atanasov</name>
<email>alexander.atanasov@virtuozzo.com</email>
</author>
<published>2023-12-15T12:10:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d7ef2eeec259eb28c0273132c1a73aa069cab13a'/>
<id>urn:sha1:d7ef2eeec259eb28c0273132c1a73aa069cab13a</id>
<content type='text'>
commit 066c5b46b6eaf2f13f80c19500dbb3b84baabb33 upstream.

In commit 8930a6c20791 ("scsi: core: add support for request batching") the
block layer bd-&gt;last flag was mapped to SCMD_LAST and used as an indicator
to send the batch for the drivers that implement this feature. However, the
error handling code was not updated accordingly.

scsi_send_eh_cmnd() is used to send error handling commands and request
sense. The problem is that request sense comes as a single command that
gets into the batch queue and times out. As a result the device goes
offline after several failed resets. This was observed on virtio_scsi
during a device resize operation.

[  496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense
[  506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0
[  506.787981] sd 0:0:4:0: [sdd] tag#117 abort

To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and
scsi_reset_ioctl().

Fixes: 8930a6c20791 ("scsi: core: add support for request batching")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Alexander Atanasov &lt;alexander.atanasov@virtuozzo.com&gt;
Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>scsi: sd: Handle read/write CDL timeout failures</title>
<updated>2023-05-22T21:05:19+00:00</updated>
<author>
<name>Niklas Cassel</name>
<email>niklas.cassel@wdc.com</email>
</author>
<published>2023-05-11T01:13:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=390e2d1a587405a522dc6b433d45648f895a352c'/>
<id>urn:sha1:390e2d1a587405a522dc6b433d45648f895a352c</id>
<content type='text'>
Commands using a duration limit descriptor that has limit policies set to a
value other than 0x0 may be failed by the device if one of the limits are
exceeded. For such commands, since the failure is the result of the user
duration limit configuration and workload, the commands should not be
retried and terminated immediately. Furthermore, to allow the user to
differentiate these "soft" failures from hard errors due to hardware
problem, a different error code than EIO should be returned.

There are 2 cases to consider:

(1) The failure is due to a limit policy failing the command with a check
condition sense key, that is, any limit policy other than 0xD.  For this
case, scsi_check_sense() is modified to detect failures with the ABORTED
COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND
TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR
RECOVERY additional sense code. For these failures, a SUCCESS disposition
is returned so that scsi_finish_command() is called to terminate the
command.

(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and DATA
CURRENTLY UNAVAILABLE additional sense code. To handle this case, the
scsi_check_sense() is modified to return a SUCCESS disposition so that
scsi_finish_command() is called to terminate the command.  In addition,
scsi_decide_disposition() has to be modified to see if a command being
terminated with GOOD status has sense data.  This is as defined in SCSI
Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status
commands were not checked before.

If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(),
so that a command that failed because of a CDL timeout cannot be
retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to
complete the command request with the BLK_STS_DURATION_LIMIT status, which
result in the user seeing ETIME errors for the failed commands.

Co-developed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Niklas Cassel &lt;niklas.cassel@wdc.com&gt;
Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: core: Allow libata to complete successful commands via EH</title>
<updated>2023-05-22T21:05:18+00:00</updated>
<author>
<name>Niklas Cassel</name>
<email>niklas.cassel@wdc.com</email>
</author>
<published>2023-05-11T01:13:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d848ca1ebc8d8864f25bd461914c93eff82a2d2'/>
<id>urn:sha1:3d848ca1ebc8d8864f25bd461914c93eff82a2d2</id>
<content type='text'>
In SCSI, we get the sense data as part of the completion, for ATA however,
we need to fetch the sense data as an extra step. For an aborted ATA
command the sense data is fetched via libata's -&gt;eh_strategy_handler().

For Command Duration Limits policy 0xD:

  The device shall complete the command without error with the additional
  sense code set to DATA CURRENTLY UNAVAILABLE.

In order to handle this policy in libata, we intend to send a successful
command via SCSI EH, and let libata's -&gt;eh_strategy_handler() fetch the
sense data for the good command. This is similar to how we handle an
aborted ATA command, just that we need to read the Successful NCQ Commands
log instead of the NCQ Command Error log.

When we get a SATA completion with successful commands, ATA_SENSE will be
set, indicating that some commands in the completion have sense data.

The sense_valid bitmask in the Sense Data for Successful NCQ Commands log
will inform exactly which commands that had sense data, which might be a
subset of all the commands that was completed in the same completion. (Yet
all will have ATA_SENSE set, since the status is per completion.)

The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" sense
data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will not set
the scmd-&gt;result to DID_TIME_OUT for these commands. However, the
successful commands that did not have sense data, must not get their result
marked as DID_TIME_OUT by SCSI EH.

Add a new flag SCMD_FORCE_EH_SUCCESS, which tells SCSI EH to not mark a
command as DID_TIME_OUT, even if it has scmd-&gt;result == SAM_STAT_GOOD.

This will be used by libata in a subsequent commit.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Niklas Cassel &lt;niklas.cassel@wdc.com&gt;
Link: https://lore.kernel.org/r/20230511011356.227789-5-nks@flawful.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: core: Declare most SCSI host template pointers const</title>
<updated>2023-03-24T23:19:19+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2023-03-22T19:53:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=31435de9746670d884f84a3c094a401aa27747aa'/>
<id>urn:sha1:31435de9746670d884f84a3c094a401aa27747aa</id>
<content type='text'>
Prepare for constifying most SCSI host template pointers by constifying the
SCSI host template pointer arguments and variables in the SCSI core.

Reviewed-by: Benjamin Block &lt;bblock@linux.ibm.com&gt;
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Mike Christie &lt;michael.christie@oracle.com&gt;
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20230322195515.1267197-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>Merge branch '6.2/scsi-queue' into 6.2/scsi-fixes</title>
<updated>2022-12-30T16:29:34+00:00</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2022-12-30T16:29:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6b1c374c45605504ed32e855c4e0f9b652a1978e'/>
<id>urn:sha1:6b1c374c45605504ed32e855c4e0f9b652a1978e</id>
<content type='text'>
Pull in remaining patches from the 6.2 queue.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
</feed>
