From 72d8c36ec364c82bf1bf0c64dfa1041cfaf139f7 Mon Sep 17 00:00:00 2001 From: Wei Fang Date: Tue, 7 Jun 2016 14:53:56 +0800 Subject: scsi: fix race between simultaneous decrements of ->host_failed sas_ata_strategy_handler() adds the works of the ata error handler to system_unbound_wq. This workqueue asynchronously runs work items, so the ata error handler will be performed concurrently on different CPUs. In this case, ->host_failed will be decreased simultaneously in scsi_eh_finish_cmd() on different CPUs, and become abnormal. It will lead to permanently inequality between ->host_failed and ->host_busy, and scsi error handler thread won't start running. IO errors after that won't be handled. Since all scmds must have been handled in the strategy handler, just remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after the strategy handler to fix this race. Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang Reviewed-by: James Bottomley Signed-off-by: Martin K. Petersen diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt index 8638f61..37eca00 100644 --- a/Documentation/scsi/scsi_eh.txt +++ b/Documentation/scsi/scsi_eh.txt @@ -263,19 +263,23 @@ scmd->allowed. 3. scmd recovered ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd - - shost->host_failed-- - clear scmd->eh_eflags - scsi_setup_cmd_retry() - move from local eh_work_q to local eh_done_q LOCKING: none + CONCURRENCY: at most one thread per separate eh_work_q to + keep queue manipulation lockless 4. EH completes ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper - layer of failure. + layer of failure. May be called concurrently but must have + a no more than one thread per separate eh_work_q to + manipulate the queue locklessly - scmd is removed from eh_done_q and scmd->eh_entry is cleared - if retry is necessary, scmd is requeued using scsi_queue_insert() - otherwise, scsi_finish_command() is invoked for scmd + - zero shost->host_failed LOCKING: queue or finish function performs appropriate locking diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 961acc7..91a9e6a 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -606,7 +606,7 @@ void ata_scsi_error(struct Scsi_Host *host) ata_scsi_port_error_handler(host, ap); /* finish or retry handled scmd's and clean up */ - WARN_ON(host->host_failed || !list_empty(&eh_work_q)); + WARN_ON(!list_empty(&eh_work_q)); DPRINTK("EXIT\n"); } diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 984ddcb..1b9c049 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1127,7 +1127,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) */ void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) { - scmd->device->host->host_failed--; scmd->eh_eflags = 0; list_move_tail(&scmd->eh_entry, done_q); } @@ -2226,6 +2225,9 @@ int scsi_error_handler(void *data) else scsi_unjam_host(shost); + /* All scmds have been handled */ + shost->host_failed = 0; + /* * Note - if the above fails completely, the action is to take * individual devices offline and flush the queue of any -- cgit v0.10.2 From 8beb330044d0d1878c7b92290e91c0b889e92633 Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Mon, 13 Jun 2016 22:00:07 -0700 Subject: 53c700: fix BUG on untagged commands The untagged command case in the 53c700 driver has been broken since host wide tags were enabled because the replaced scsi_find_tag() function had a special case for the tag value SCSI_NO_TAG to retrieve sdev->current_cmnd. The replacement function scsi_host_find_tag() has no such special case and returns NULL causing untagged commands to trigger a BUG() in the driver. Inspection shows that the 53c700 is the only driver using this SCSI_NO_TAG case, so a local fix in the driver suffices to fix this problem globally. Fixes: 64d513ac31b - "scsi: use host wide tags by default" Cc: stable@vger.kernel.org # 4.4+ Reported-by: Helge Deller Tested-by: Helge Deller Signed-off-by: James Bottomley Reviewed-by: Johannes Thumshirn Reviewed-by: Ewan D. Milne Acked-by: Christoph Hellwig Signed-off-by: Martin K. Petersen diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c index d4c2856..3ddc85e 100644 --- a/drivers/scsi/53c700.c +++ b/drivers/scsi/53c700.c @@ -1122,7 +1122,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, } else { struct scsi_cmnd *SCp; - SCp = scsi_host_find_tag(SDp->host, SCSI_NO_TAG); + SCp = SDp->current_cmnd; if(unlikely(SCp == NULL)) { sdev_printk(KERN_ERR, SDp, "no saved request for untagged cmd\n"); @@ -1826,7 +1826,7 @@ NCR_700_queuecommand_lck(struct scsi_cmnd *SCp, void (*done)(struct scsi_cmnd *) slot->tag, slot); } else { slot->tag = SCSI_NO_TAG; - /* must populate current_cmnd for scsi_host_find_tag to work */ + /* save current command for reselection */ SCp->device->current_cmnd = SCp; } /* sanity check: some of the commands generated by the mid-layer -- cgit v0.10.2