summaryrefslogtreecommitdiff
path: root/drivers/ata/libata-eh.c
AgeCommit message (Collapse)AuthorFilesLines
2007-10-12libata-link: update EH to deal with PMP linksTejun Heo1-70/+158
Update ata_eh_autopsy(), ata_eh_report(), ata_eh_revalidate_and_attach() and ata_eh_recover() to deal with PMP links. ata_eh_autopsy() and ata_eh_report() updates are straightforward. They just repeat the same operation over all configured links. The only change to ata_eh_revalidate_and_attach() is avoiding calling ->cable_select() on non-host ports. ata_eh_recover() update is more complex as it first processes all resets and then performs the rest. This is necessary as thawing with some links in unknown state can be dangerous. ehi->action is cleared on successful recovery of a link to avoid repeating recovery due to failures in other links. ata_eh_recover() iterates over only PMP links if PMP is attached, and, on failure, the failing link is returned in @failed_link instead of disabling devices directly. These are to integrate ata_eh_recover() into PMP EH later. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: update ata_scsi_error() to handle PMP linksTejun Heo1-4/+9
Update ata_scsi_error() to handle PMP links. As error conditions can occur on both host and PMP links, __ata_port_for_each_link() is used. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: implement ata_link_abort()Tejun Heo1-14/+36
Implement ata_link_abort(). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: linkify config/EH related functionsTejun Heo1-58/+63
Make the following functions deal with ata_link instead of ata_port. * ata_set_mode() * ata_eh_autopsy() and related functions * ata_eh_report() and related functions * suspend/resume related functions * ata_eh_recover() and related functions Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: linkify resetTejun Heo1-18/+17
Make reset methods and related functions deal with ata_link instead of ata_port. * ata_do_reset() * ata_eh_reset() * all prereset/reset/postreset methods and related functions This patch introduces no behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: linkify EH action helpersTejun Heo1-17/+18
Make ata_eh_about_to_do() and ata_eh_done() deal with ata_link instead of ata_port. This patch introduces no behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: linkify PHY-related functionsTejun Heo1-15/+16
Make the following PHY-related functions to deal with ata_link instead of ata_port. * sata_print_link_status() * sata_down_spd_limit() * ata_set_sata_spd_limit() and friends * sata_link_debounce/resume() * sata_scr_valid/read/write/write_flush() * ata_link_on/offline() This patch introduces no behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: implement and use link/device iteratorsTejun Heo1-58/+59
Multiple links and different number of devices per link should be considered to iterate over links and devices. This patch implements and uses link and device iterators - ata_port_for_each_link() and ata_link_for_each_dev() - and ata_link_max_devices(). This change makes a lot of functions iterate over only possible devices instead of from dev 0 to dev ATA_MAX_DEVICES. All such changes have been examined and nothing should be broken. While at it, add a separating comment before device helpers to distinguish them better from link helpers and others. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12libata-link: introduce ata_linkTejun Heo1-49/+51
Introduce ata_link. It abstracts PHY and sits between ata_port and ata_device. This new level of abstraction is necessary to support SATA Port Multiplier, which basically adds a bunch of links (PHYs) to a ATA host port. Fields related to command execution, spd_limit and EH are per-link and thus moved to ata_link. This patch only defines the host link. Multiple link handling will be added later. Also, a lot of ap->link derefences are added but many of them will be removed as each part is converted to deal directly with ata_link instead of ata_port. This patch introduces no behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: implement EH fast drainTejun Heo1-2/+97
In most cases, when EH is scheduled, all in-flight commands are aborted causing EH to kick in immediately. However, in some cases (especially with PMP), it's unclear which commands are affected by the error condition and although aborting all in-flight commands work, it isn't optimal and may cause unnecessary disruption. On the other hand, waiting for in-flight commands to drain themselves can take up to 30seconds. This patch implements EH fast drain to handle such situations. It gives in-flight commands some time to finish up but doesn't wait for too long. After EH is scheduled, fast drain timer is started and if no other completion occurs in ATA_EH_FASTDRAIN_INTERVAL all in-flight commands are aborted. If any completion occurred in the interval, the port is given another interval to finish up itself. Currently ATA_EH_FASTDRAIN_INTERVAL is 3 secs which should be enough for finishing up most commands. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: schedule probing after SError access failure during autopsyTejun Heo1-1/+5
If SError isn't accessible, EH can't tell whether hotplug has happened or not. Report SError read failure with AC_ERR_OTHER and schedule probing with hardreset. This will be mainly useful for PMPs. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: clear HOTPLUG flag after a resetTejun Heo1-5/+10
ATA_EHI_HOTPLUGGED is a hint for reset functions indicating the the port might have gone through hotplug/unplug just before entering EH. Reset functions modify their behaviors a bit to handle the situation better - e.g. using longer debouncing delay. Currently, once HOTPLUG is set, it isn't cleared till the end of EH. This is unnecessary and makes EH take longer. Clear the HOTPLUGGED flag after a reset try (successful or not). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: quickly trigger SATA SPD down after debouncing failedTejun Heo1-1/+1
Debouncing failure is a good indicator of basic link problem. Use -EPIPE to indicate debouncing failure and make ata_eh_reset() invoke sata_down_spd_limit() if the error occurs during reset. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: improve SATA PHY speed down logicTejun Heo1-0/+6
sata_down_spd_limit() first reads the current SPD from SStatus and limit the speed to the lower one of one below the current limit or one below the current SPD in SStatus. SPD may not be accessible or valid when SPD down is requested making sata_down_spd_limit() fail when it's most needed. This patch makes the current SPD cached after each successful reset and forces GEN I speed (1.5Gbps) if neither of SStatus or the cached value is valid, so sata_down_spd_limit() is now guaranteed to lower the speed limit if lower speed is available. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: implement AC_ERR_NCQTejun Heo1-3/+4
When an NCQ command fails, all commands in flight are aborted and the offending one is reported using log page 10h. Depending on controller characteristics and LLD implementation, all commands may appear as having a device error due to shared TF status making it hard to determine what's actually going on. This patch adds AC_ERR_NCQ, marks the command reported by log page 10h with it and print extra "<F>" after the error report for the command to help distinguishing the offending command. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-20libata: improve EH report formattingTejun Heo1-2/+67
Requiring LLDs to format multiple error description messages properly doesn't work too well. Help LLDs a bit by making ata_ehi_push_desc() insert ", " on each invocation. __ata_ehi_push_desc() is the raw version without the automatic separator. While at it, make ehi_desc interface proper functions instead of macros. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-11libata-link: separate out ata_eh_handle_dev_fail()Tejun Heo1-44/+52
Separate out ata_eh_handle_dev_fail() from ata_eh_recover(). This is in preparation of ata_link and PMP support. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-09libata-acpi: implement _GTM/_STM supportTejun Heo1-1/+7
Implement _GTM/_STM support. acpi_gtm is added to ata_port which stores _GTM parameters over suspend/resume cycle. A new hook ata_acpi_on_suspend() is responsible for storing _GTM parameters during suspend. _STM is executed in ata_acpi_on_resume(). With this change, invoking _GTF is safe on IDE hierarchy and acpi_sata check before _GTF is removed. ata_acpi_gtm() and ata_acpi_stm() implementation is taken from Alan Cox's pata_acpi implementation. ata_acpi_gtm() is fixed such that the result parameter is not shifted by sizeof(union acpi_object). Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-09libata: reimplement ACPI invocationTejun Heo1-0/+3
This patch reimplements ACPI invocation such that, instead of exporting ACPI details to the rest of libata, ACPI event handlers - ata_acpi_on_resume() and ata_acpi_on_devcfg() - are used. These two functions are responsible for determining whether specific ACPI method is used and when. On resume, _GTF is scheduled by setting ATA_DFLAG_ACPI_PENDING device flag. This is done this way to avoid performing the action on wrong device device (device swapping while suspended). On every ata_dev_configure(), ata_acpi_on_devcfg() is called, which performs _SDD and _GTF. _GTF is performed only after resuming and, if SATA, hardreset as the ACPI spec specifies. As _GTF may contain arbitrary commands, IDENTIFY page is re-read after _GTF taskfiles are executed. If one of ACPI methods fails, ata_acpi_on_devcfg() retries on the first failure. If it fails again on the second try, ACPI is disabled on the device. Note that successful configuration clears ACPI failed status. With all feature checks moved to the above two functions, do_drive_set_taskfiles() is trivial and thus collapsed into ata_acpi_exec_tfs(), which is now static and converted to return the number of executed taskfiles to be used by ata_acpi_on_resume(). As failures are handled properly, ata_acpi_push_id() now returns -errno on errors instead of unconditional zero. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-06-27libata: fix infinite EH waiting bugTejun Heo1-0/+1
When EH gives up after repeated exceptions, it doesn't't clear the PENDING bit on exit which leaves PENDING bit set without EH actually scheduled. This makes ata_port_wait_eh() to wait forever makes rmmod hang on such port. Fix it by clearing the flag. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-06-27libata: remove unused variable from ata_eh_reset()Tejun Heo1-3/+1
Removed unused variable did_followup_srst from ata_eh_reset(). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-06-27libata: kill non-sense warning messageTejun Heo1-2/+0
prereset() is now allowed to set flag for unsupported reset method. EH layer is responsible for selecting the fallback. Remove non-sense warning message. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-22libata: Trim trailing whitespaceJeff Garzik1-1/+1
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-12libata: give devices one last chance even if recovery failed with -EINVALTejun Heo1-5/+1
After certain errors, some devices report complete garbage on IDENTIFY. This can cause ata_dev_read_id() to fail with -EINVAL resulting in immediate disabling of the device. Give the device one last chance after -EINVAL to allow recovery from such situations. As -EINVAL is triggered very rarely, this shouldn't cause any noticeable affect on more common error paths. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Harald Dunkel <harald.dunkel@t-online.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-12libata: ignore EH scheduling during initializationTejun Heo1-0/+3
libata enables SCSI host during ATA host activation which happens after IRQ handler is registered and IRQ is enabled. All ATA ports are in frozen state when IRQ is enabled but frozen ports may raise limited number of IRQs after being frozen - IOW, ->freeze() is not responsible for clearing pending IRQs. During normal operation, the IRQ handler is responsible for clearing spurious IRQs on frozen ports and it usually doesn't require any extra code. Unfortunately, during host initialization, the IRQ handler can end up scheduling EH for a port whose SCSI host isn't initialized yet. This results in OOPS in the SCSI midlayer. This is relatively short window and scheduling EH for probing is the first thing libata does after initialization, so ignoring EH scheduling until initialization is complete solves the problem nicely. This problem was spotted by Berck E. Nash in the following thread. http://thread.gmane.org/gmane.linux.kernel/519412 Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Berck E. Nash <flyboy@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-12libata: reimplement suspend/resume support using sdev->manage_start_stopTejun Heo1-233/+4
Reimplement suspend/resume support using sdev->manage_start_stop. * Device suspend/resume is now SCSI layer's responsibility and the code is simplified a lot. * DPM is dropped. This also simplifies code a lot. Suspend/resume status is port-wide now. * ata_scsi_device_suspend/resume() and ata_dev_ready() removed. * Resume now has to wait for disk to spin up before proceeding. I couldn't find easy way out as libata is in EH waiting for the disk to be ready and sd is waiting for EH to complete to issue START_STOP. * sdev->manage_start_stop is set to 1 in ata_scsi_slave_config(). This fixes spindown on shutdown and suspend-to-disk. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-01libata: reimplement reset sequencingTejun Heo1-17/+41
libata previously depended upon waits in prereset to get resets after hotplug right for both spin up and device ready wait. This was necessary both for reliablity and speed as reset was likely to fail if initiated too early and each try usually took more than 30secs to fail. Previous patches fixed the reliability part by fixing status and SCR handling in resets. This patch remedies the speed part by improving reset sequencing. Prereset waiting timeout is adjusted to 10s because spinup wait is replaced by reset sequencing and !BSY wait is not as important as before. During boot or module loading where the drive is already fully spun up, !BSY wait succeeds immediately, so 10s should be enough in most cases. It matters after hotplugging or other error conditions, but in those cases, !BSY wait in prereset simply can't be relied upon due to the varied and weird behaviors ATA controllers and devices show. Reset is now driven by ata_eh_reset_timeouts[] table which contains timeouts for each reset try. The first reset can be softreset but the following ones are always hardreset if available. Each timeout defines deadline for the reset try. If a reset try fails, reset is retried with the next timeout till the end of the timeout table is reached. If a reset try fails before the timeout with error, libata waits till the deadline of the failed try before retrying. IOW, the timeout table defines timetable of reset tries such that the n'th try always begins at least after the sum of all previous timeouts has passed. The current timetable defines 4 tries and takes around 1 minute. @0 : First try. This should succeed most of the time during boot. @10 : 10s is enough to spin up most consumer harddrives. Give it another shot. @20 : 20s should spin up > 99% of working drives. This has 30s timeout for retarded devices needing long idleness post reset. @55 : Final try with 5s timeout just in case. The above timetable is trade off between not annoying the device too much with frequent resets and taking reasonable amount of time in most cases. Some controllers may do better with shorter timeouts while others may fare better with longer but we just can't rely upon LLD writers to test each controller with wide variety of devices using various scenarios. We need default behavior which reasonably fits most cases. I've tested the above timetable on a dozen SATA controllers and a few PATA controllers with about a dozen different drives from all major vendors and 4 different ODDs from three different vendors for both boot and hotplug (if available) cases. Boot probing is not affected unless the device is broken in which cases new code gives up on the port after a minute rather than five or nine minutes. When hotplugging, most devices get detected on the first or second try. Multi-platter drives with long spin up time which sometimes took > 40 secs with the original code, now usually comes up during the second try and at least right after the third try @20. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-01libata: add deadline support to prereset and reset methodsTejun Heo1-5/+5
Add @deadline to prereset and reset methods and make them honor it. ata_wait_ready() which directly takes @deadline is implemented to be used as the wait function. This patch is in preparation for EH timing improvements. * ata_wait_ready() never does busy sleep. It's only used from EH and no wait in EH is that urgent. This function also prints 'be patient' message automatically after 5 secs of waiting if more than 3 secs is remaining till deadline. * ata_bus_post_reset() now fails with error code if any of its wait fails. This is important because earlier reset tries will have shorter timeout than the spec requires. If a device fails to respond before the short timeout, reset should be retried with longer timeout rather than silently ignoring the device. There are three behavior differences. 1. Timeout is applied to both devices at once, not separately. This is more consistent with what the spec says. 2. When a device passes devchk but fails to become ready before deadline. Previouly, post_reset would just succeed and let device classification remove the device. New code fails the reset thus causing reset retry. After a few times, EH will give up disabling the port. 3. When slave device passes devchk but fails to become accessible (TF-wise) after reset. Original code disables dev1 after 30s timeout and continues as if the device doesn't exist, while the patched code fails reset. When this happens, new code fails reset on whole port rather than proceeding with only the primary device. If the failing device is suffering transient problems, new code retries reset which is a better behavior. If the failing device is actually broken, the net effect is identical to it, but not to the other device sharing the channel. In the previous code, reset would have succeeded after 30s thus detecting the working one. In the new code, reset fails and whole port gets disabled. IMO, it's a pathological case anyway (broken device sharing bus with working one) and doesn't really matter. * ata_bus_softreset() is changed to return error code from ata_bus_post_reset(). It used to return 0 unconditionally. * Spin up waiting is to be removed and not converted to honor deadline. * To be on the safe side, deadline is set to 40s for the time being. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-04-28libata: separate ATA_EHI_DID_RESET into DID_SOFTRESET and DID_HARDRESETTejun Heo1-1/+4
Separate ATA_EHI_DID_RESET into ATA_EHI_DID_SOFTRESET and ATA_EHI_DID_HARDRESET. ATA_EHI_DID_RESET is redefined as OR of the two flags. This patch doesn't introduce any behavior change. This will be used later to determine whether _SDD is necessary or not. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-04-28libata: add missing call to ->cable_detect() in new EH pathTejun Heo1-0/+4
->cable_detect() used to be called on by the old ata_bus_probe() path. Add invocation to ata_eh_revalidate_and_attach() right after IDENTIFYs are done. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-04-28libata: improve AC_ERR_DEV handling for ->post_internal_cmdTejun Heo1-1/+3
->post_internal_cmd is simplified EH for internal commands. Its primary mission is to stop the controller such that no rogue memory access or other activities occur after the internal command is released. It may provide error diagnostics by setting qc->err_mask but this hasn't been a requirement. To ignore SETXFER failure for CFA devices, libata needs to know whether a command was failed by the device or for any other reason. ie. internal command needs to get AC_ERR_DEV right. This patch makes the following changes to AC_ERR_DEV handling and ->post_internal_cmd semantics to accomodate this need and simplify callback implementation. 1. As long as the correct bits in the result TF registers are set, there is no need to set AC_ERR_DEV explicitly. libata EH core takes care of that for both normal and internal commands. 2. The only requirement for ->post_internal_cmd() is to put the controller into quiescent state. It needs not to set any err_mask. 3. ata_exec_internal_sg() performs minimal error analysis such that AC_ERR_DEV is automatically set as long as result_tf is filled correctly. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-04-28libata: hardreset on SERR_INTERNALTejun Heo1-1/+1
There was a rare report where SB600 reported SERR_INTERNAL and SRST couldn't get it out of the failure mode. Hardreset on SERR_INTERNAL. As the problem is intermittent, whether this fixes the problem or not hasn't been verified yet, but hardresetting the channel on internal error is a good idea anyway. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-04-04libata: Clear tf before doing request sense (take 3)Albert Lee1-11/+11
patch 2/4: Clear tf before doing request sense. This fixes the AOpen 56X/AKH timeout problem. (http://bugzilla.kernel.org/show_bug.cgi?id=8244) Signed-off-by: Albert Lee <albertcc@tw.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-03-28libata: IDENTIFY backwards for drive side cable detectionTejun Heo1-24/+42
For drive side cable detection to work correctly, drives need to be identified backwards such that the slave device releases PDIAG- before the mater drive tries to detect cable type. ata_bus_probe() was fixed by commit f31f0cc2f0b7527072d94d02da332d9bb8d7d94c but the new EH path wasn't fixed. This patch makes new EH path do IDENTIFY backwards. ata_dev_configure() for new devices are still performed master first. This is to keep the detection messages in forward order. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-03-19libata: don't whine if ->prereset() returns -ENOENTTejun Heo1-1/+7
->prereset() returns -ENOENT to tell libata that the port is empty and reset sequencing should be stopped. This is not an error condition. Update ata_eh_reset() such that it sets device classes to ATA_DEV_NONE and return success in on -ENOENT. This makes spurious error message go away. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-03-03libata: add CONFIG_PM to libata core layerTejun Heo1-0/+29
Conditionalize all PM related stuff in libata core layer using CONFIG_PM. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-02-21libata: s/ap->id/ap->print_id/gTejun Heo1-2/+2
ata_port has two different id fields - id and port_no. id is system-wide 1-based unique id for the port while port_no is 0-based host-wide port number. The former is primarily used to identify the ATA port to the user in printk messages while the latter is used in various places in libata core and LLDs to index the port inside the host. The two fields feel quite similar and sometimes ap->id is used in place of ap->port_no, which is very difficult to spot. This patch renames ap->id to ap->print_id to reduce the possibility of such bugs. Some printk messages are adjusted such that id string (ata%u[.%u]) isn't printed twice and/or to use ata_*_printk() instead of hardcoded id format. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-02-21libata: put some intelligence into EH speed down sequenceTejun Heo1-60/+124
The current EH speed down code is more of a proof that the EH framework is capable of adjusting transfer speed in response to error. This patch puts some intelligence into EH speed down sequence. The rules are.. * If there have been more than three timeout, HSM violation or unclassified DEV errors for known supported commands during last 10 mins, NCQ is turned off. * If there have been more than three timeout or HSM violation for known supported command, transfer mode is slowed down. If DMA is active, it is first slowered by one grade (e.g. UDMA133->100). If that doesn't help, it's slowered to 40c limit (UDMA33). If PIO is active, it's slowered by one grade first. If that doesn't help, PIO0 is forced. Note that this rule does not change transfer mode. DMA is never degraded into PIO by this rule. * If there have been more than ten ATA bus, timeout, HSM violation or unclassified device errors for known supported commands && speeding down DMA mode didn't help, the device is forced into PIO mode. Note that this rule is considered only for PATA devices and is pretty difficult to trigger. One error can only trigger one rule at a time. After a rule is triggered, error history is cleared such that the next speed down happens only after some number of errors are accumulated. This makes sense because now speed down is done in bigger stride. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-02-21libata: improve probe failure handlingTejun Heo1-17/+18
* Move forcing device to PIO0 on device disable into ata_dev_disable(). This makes both old and new EHs act the same way. * Speed down only PIO mode on probe failure. All commands used during probing are PIO commands. There's no point in speeding down DMA. * Retry at least once after -ENODEV. Some devices report garbled IDENTIFY data after certain events. This shouldn't cause device detach and re-attach. * Rearrange EH failure path for simplicity. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-02-21libata: improve ata_down_xfermask_limit()Tejun Heo1-3/+6
Make ata_down_xfermask_limit() accept @sel instead of @force_pio0. @sel selects how the xfermask limit will be adjusted. The following selectors are defined. * ATA_DNXFER_PIO : only speed down PIO * ATA_DNXFER_DMA : only speed down DMA, don't cause transfer mode change * ATA_DNXFER_40C : apply 40c cable limit * ATA_DNXFER_FORCE_PIO : force PIO * ATA_DNXFER_FORCE_PIO0 : force PIO0 (same as original with @force_pio0 == 1) * ATA_DNXFER_ANY : same as original with @force_pio0 == 0 Currently, only ANY and FORCE_PIO0 are used to maintain the original behavior. Other selectors will be used later to improve EH speed down sequence. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-02-10libata: kill qc->nsect and cursectTejun Heo1-6/+1
libata used two separate sets of variables to record request size and current offset for ATA and ATAPI. This is confusing and fragile. This patch replaces qc->nsect/cursect with qc->nbytes/curbytes and kills them. Also, ata_pio_sector() is updated to use bytes for qc->cursg_ofs instead of sectors. The field used to be used in bytes for ATAPI and in sectors for ATA. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-01-27libata: fix ata_eh_suspend() return valueTejun Heo1-1/+1
ata_eh_suspend() was returning 0 regardless of failure. This bug has potential to lose data on suspend. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-01-20libata: fix handling of port actions in per-dev action maskTejun Heo1-0/+4
libata EH ignores port-wide actions in per-dev action mask. However, device resume requests EH_SOFTRESET using per-dev action mask. Under certain circumstances, this results in not resetting frozen port after resuming which causes failure of all commands. This patch allows port-wide actions to be requested in per-dev action mask. Before EH recovery starts, port-wide actions will be collected. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-05Merge branch 'master' of ↵David Howells1-29/+79
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-03[PATCH] libata: always use polling IDENTIFYTejun Heo1-3/+0
libata switched to IRQ-driven IDENTIFY when IRQ-driven PIO was introduced. This has caused a lot of problems including device misdetection and phantom device. ATA_FLAG_DETECT_POLLING was added recently to selectively use polling IDENTIFY on problemetic drivers but many controllers and devices are affected by this problem and trying to adding ATA_FLAG_DETECT_POLLING for each such case is diffcult and not very rewarding. This patch makes libata always use polling IDENTIFY. This is consistent with libata's original behavior and drivers/ide's behavior. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-03[PATCH] libata: don't request sense if the port is frozenTejun Heo1-13/+15
If EH command is issued to a frozen port, it fails with AC_ERR_SYSTEM. libata used to request sense even when the port is frozen needlessly adding AC_ERR_SYSTEM to err_mask. Don't do it. Signed-off-by: Tejun Heo <htejun@gmail.com>
2006-12-02[PATCH] libata: print cdb[0] in failed qc reportTejun Heo1-2/+3
Print cdb[0] in failed qc report. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-02[PATCH] libata: improve failed qc reportingTejun Heo1-5/+27
Improve failed qc reporting. The original message didn't include the actual command nor full error status and it was necessary to temporarily patch the code to find out exactly which command is causing problem. This patch makes EH report full command and result TFs along with data direction and length. This change will make bug reports more useful. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-02[PATCH] libata: implement presence detection via polling IDENTIFYTejun Heo1-5/+18
On some controllers (ICHs in piix mode), there is *NO* reliable way to determine device presence other than issuing IDENTIFY and see how the transaction proceeds by watching the TF status register. libata acted this way before irq-pio and phantom devices caused very little problem but now that IDENTIFY is performed using IRQ drive PIO, such phantom devices now result in multiple 30sec timeouts during boot. This patch implements ATA_FLAG_DETECT_POLLING. If a LLD sets this flag, libata core issues the initial IDENTIFY in polling mode and if the initial data transfer fails w/ HSM violation, the port is considered to be empty thus replicating the old libata and IDE behavior. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-12-02[PATCH] libata: convert @post_reset to @flags in ata_dev_read_id()Tejun Heo1-4/+7
Make ata_dev_read_id() take @flags instead of @post_reset. Currently there is only one flag defined - ATA_READID_POSTRESET, which is equivalent to @post_reset. This is preparation for polling presence detection. Signed-off-by: Jeff Garzik <jeff@garzik.org>