kernel/linux.git/drivers/scsi/scsi_debug.c, branch v5.9.12

scsi: scsi_debug: Remove superfluous close zone in resp_open_zone()

2020-08-25T03:02:18+00:00

resp_open_zone() always calls zbc_open_zone() with parameter explicit set to true. If zbc_open_zone() is called with parameter explicit set to true, and the current zone state is implicit open, it will call zbc_close_zone() on the zone before proceeding. Therefore, there is no need for resp_open_zone() to call zbc_close_zone() on an implicitly open zone before calling zbc_open_zone(). Remove superfluous close zone in resp_open_zone(). Link: https://lore.kernel.org/r/20200821130007.39938-1-niklas.cassel@wdc.com Reviewed-by: Damien Le Moal Signed-off-by: Niklas Cassel Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Fix scp is NULL errors

2020-08-18T02:13:17+00:00

John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The problem was tracked down to a missing critical section on a "short circuit" path. Namely, the time to process the current command so far has already exceeded the requested command duration (i.e. the number of nanoseconds in the ndelay parameter). The random=1 parameter setting was pivotal in finding this error. The failure scenario involved first taking that "short circuit" path (due to a very short command duration) and then taking the more likely hrtimer_start() path (due to a longer command duration). With random=1 each command's duration is taken from the uniformly distributed [0..ndelay) interval. The fio utility also helped by reliably generating the error scenario at about once per minute on a RPi 4 (64 bit OS). Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.com Reported-by: John Garry Reviewed-by: Lee Duncan Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Implement tur_ms_to_ready parameter

2020-07-29T04:03:52+00:00

The current driver responds to TEST UNIT READY (TUR) with a GOOD status immediately after a scsi_debug device (LU) is created. This is unrealistic as even SSDs take some time after power-on before accepting media access commands. Add the tur_ms_to_ready parameter whose unit is milliseconds (default 0) and is the period before which a TUR (or any media access command) will set the CHECK CONDITION status with a sense key of NOT READY and an additional sense of "Logical unit is in process of becoming ready". The period starts when each scsi_debug device is created. This patch was prompted by T10 proposal 20-061r2 which was accepted on 2020716. It adds that a TUR in the situation described in the previous paragraph may set the INFO field (or descriptor) in the sense data to the estimated number in milliseconds before a subsequent TUR will yield a GOOD status. This patch follows that advice. Link: https://lore.kernel.org/r/20200724155531.668144-1-dgilbert@interlog.com Reported-by: kernel test robot Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Fix request sense

2020-07-29T04:00:01+00:00

The SCSI REQUEST SENSE command emulation was found to be broken. It is a quite complex command so try and make it do a subset of what it should do. Remove the attempt to mimic SCSI-1 REQUEST SENSE (i.e. return the sense data for the previous failed command). Add some reporting of "pollable" sense data [see spc6r02: 5.12.2]. Keep the IEC mode page MRIE=6 TEST=1 predictive failure reporting. Link: https://lore.kernel.org/r/20200723194819.545573-1-dgilbert@interlog.com Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Update documentation url and bump version

2020-07-14T03:49:47+00:00

This driver maintains a version number which is cross-referenced in the documentation (e.g. to indicate when features are added or changed) and exposed through the responses to various SCSI commands. For example the version number is use as the Product Revision number in standard SCSI INQUIRY responses issued by this driver. The version date string is placed in a vendor specific area in each standard SCSI INQUIRY response. This patch bumps both. Update the driver documentation URL that appears at the top of the driver source file. Link: https://lore.kernel.org/r/20200712182927.72044-3-dgilbert@interlog.com Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen

scsi: scsi_debug: every_nth triggered error injection

2020-07-14T03:49:46+00:00

This patch simplifies, or at least makes more consistent, the way setting the every_nth parameter injects errors. Here is a list of 'opts' flags and in which cases they inject errors when abs(every_nth)%command_count == 0 is reached: - OPT_RECOVERED_ERR: issued on READ(*)s, WRITE(*)s and WRITE_SCATTEREDs - OPT_DIF_ERR: issued on READ(*)s, WRITE(*)s and WRITE_SCATTEREDs - OPT_DIX_ERR: issued on READ(*)s, WRITE(*)s and WRITE_SCATTEREDs - OPT_SHORT_TRANSFER: issued on READ(*)s - OPT_TRANSPORT_ERR: issued on all commands - OPT_CMD_ABORT: issued on all commands The other uses of every_nth were not modified. Previously if, for example, OPT_SHORT_TRANSFER was armed then if (abs(every_nth) % command_count == 0) occurred during a command that was _not_ a READ, then no error injection occurred. This behaviour puzzled several testers. Now a global "inject_pending" flag is set and the _next_ READ will get hit and that flag is cleared. OPT_RECOVERED_ERR, OPT_DIF_ERR and OPT_DIX_ERR have similar behaviour. A downside of this is that there might be a hang-over pending injection that gets triggered by a following test. Also expand the every_nth runtime parameter so that it can take hex value (i.e. with a leading '0x') as well as a decimal value. Now both the 'opts' and the 'every_nth' runtime parameters can take hexadecimal values. Link: https://lore.kernel.org/r/20200712182927.72044-2-dgilbert@interlog.com Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Support hostwide tags

2020-07-14T03:42:48+00:00

Many SCSI HBAs support a hostwide tagset, whereby each command submitted to the HW from all submission queues must have a unique tag identifier. Normally this unique tag will be in the range [0, max queue], where "max queue" is the depth of each of the submission queues. Add support for this hostwide tag feature, via module parameter "host_max_queue". A non-zero value means that the feature is enabled. In this case, the submission queues are not exposed to upper layer, i.e. from blk-mq prespective, the device has a single hw queue. There are 2 reasons for this: a. It is assumed that the host can support nr_hw_queues * can_queue commands, but this is not true for hostwide tags b. For nr_hw_queues != 0, the request tag is not unique over all HW queues, and some HBA drivers want to use this tag for the hostwide tag However, like many SCSI HBA drivers today - megaraid sas being an example - the full set of HW submission queues are still used in the LLDD driver. So instead of using a complicated "reply_map" to create a per-CPU submission queue mapping like megaraid_sas (as it depends on a PCI device + MSIs) - use a simple algorithm: hwq = cpu % queue count If the host_max_queue param is set non-zero, then the max queue depth is fixed at this value also. If and when hostwide shared tags are supported in blk-mq/scsi mid-layer, then the policy to set nr_hw_queues = 0 for hostwide tags can be revised. Link: https://lore.kernel.org/r/1594297400-24756-3-git-send-email-john.garry@huawei.com Acked-by: Douglas Gilbert Signed-off-by: John Garry Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Add check for sdebug_max_queue during module init

2020-07-14T03:42:31+00:00

sdebug_max_queue should not exceed SDEBUG_CANQUEUE, otherwise crashes like this can be triggered by passing an out-of-range value: Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 pstate: 20400009 (nzCv daif +PAN -UAO BTYPE=--) pc : schedule_resp+0x2a4/0xa70 [scsi_debug] lr : schedule_resp+0x52c/0xa70 [scsi_debug] sp : ffff800022ab36f0 x29: ffff800022ab36f0 x28: ffff0023a935a610 x27: ffff800008e0a648 x26: 0000000000000003 x25: ffff0023e84f3200 x24: 00000000003d0900 x23: 0000000000000000 x22: 0000000000000000 x21: ffff0023be60a320 x20: ffff0023be60b538 x19: ffff800008e13000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000001 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 00000000000000c1 x5 : 0000020000200000 x4 : dead0000000000ff x3 : 0000000000000200 x2 : 0000000000000200 x1 : ffff800008e13d88 x0 : 0000000000000000 Call trace: schedule_resp+0x2a4/0xa70 [scsi_debug] scsi_debug_queuecommand+0x2c4/0x9e0 [scsi_debug] scsi_queue_rq+0x698/0x840 __blk_mq_try_issue_directly+0x108/0x228 blk_mq_request_issue_directly+0x58/0x98 blk_mq_try_issue_list_directly+0x5c/0xf0 blk_mq_sched_insert_requests+0x18c/0x200 blk_mq_flush_plug_list+0x11c/0x190 blk_flush_plug_list+0xdc/0x110 blk_finish_plug+0x38/0x210 blkdev_direct_IO+0x450/0x4d8 generic_file_read_iter+0x84/0x180 blkdev_read_iter+0x3c/0x50 aio_read+0xc0/0x170 io_submit_one+0x5c8/0xc98 __arm64_sys_io_submit+0x1b0/0x258 el0_svc_common.constprop.3+0x68/0x170 do_el0_svc+0x24/0x90 el0_sync_handler+0x13c/0x1a8 el0_sync+0x158/0x180 Code: 528847e0 72a001e0 6b00003f 540018cd (3941c340) In addition, it should not be less than 1. So add checks for these, and fail the module init for those cases. [mkp: changed if condition to match error message] Link: https://lore.kernel.org/r/1594297400-24756-2-git-send-email-john.garry@huawei.com Fixes: c483739430f1 ("scsi_debug: add multiple queue support") Reviewed-by: Ming Lei Acked-by: Douglas Gilbert Signed-off-by: John Garry Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Fix in_use bitmap corruption

2020-07-03T03:49:54+00:00

Heavy testing indicates the irqsave() spinlock around the __set_bit() is insufficient to stop following clear_bit() calls being rarely applied out-of-order. Also the nearby failed kzalloc() path leading to SCSI_MLQUEUE_HOST_BUSY does not properly undo the in_use bitmap and num_in_q, fix. Link: https://lore.kernel.org/r/20200702145355.522283-1-dgilbert@interlog.com Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen

scsi: scsi_debug: Parser tables and code interaction

2020-05-20T02:25:52+00:00

This patch is in response to a static analyser report from Dan Carpenter titled: "[bug report] scsi: scsi_debug: Add per_host_store option". This code may not clear the static analyzer reports, but may shed light on why they occur. Amongst other things this driver has a table driven SCSI command parser which also involves some C code. There are some invariants between the table entries and the corresponding C code (i.e. the resp_*() functions) that, if broken, may lead to a NULL dereference. And the report is valid, at least in the case of the PRE-FETCH command. Alas, that is not one of the cases that the static analyzer reported. In this particular corner case: when the fake_rw flag is set and the table entry for a "store"-accessing command does not have the required F_FAKE_RW flag set, do the following. Call BUG_ON() in the devip2sip() very close to a comment block explaining why it was called and how to fix it. checkpatch.pl complains about the BUG_ON() but there is no reasonable remedial action that can be taken at run time. This change allows the code reported by the static analyzer to be simplified. Comments were also added to the table flags (e.g. F_FAKE_RW) so developers who add commands might be more inclined to use them (properly). Link: https://lore.kernel.org/r/20200513013943.25285-1-dgilbert@interlog.com Reported-by: Dan Carpenter Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen