Age | Commit message (Collapse) | Author |
|
commit ffb58456589443ca572221fabbdef3db8483a779 upstream.
mpt3sas has a firmware failure where it can only handle one pass through
ATA command at a time. If another comes in, contrary to the SAT
standard, it will hang until the first one completes (causing long
commands like secure erase to timeout). The original fix was to block
the device when an ATA command came in, but this caused a regression
with
commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Tue Nov 22 16:17:13 2016 -0800
scsi: srp_transport: Move queuecommand() wait code to SCSI core
So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends. The original patch
also had a concurrency problem since scsih_qcmd is lockless at that
point (this is fixed by using atomic bitops to set and test the flag).
[mkp: addressed feedback wrt. test_bit and fixed whitespace]
Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9373eba6cfae48911b977d14323032cd5d161aae upstream.
The call to scsi_is_sas_rphy() needs to be made on the SAS end_device,
not on the SCSI device.
Fixes: 835831c57e9b ("ses: use scsi_is_sas_rphy instead of is_sas_attached")
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 387b978cb0d12cf3720ecb17e652e0a9991a08e2 upstream.
Current code incorrectly calculates the max transfer length, since
it is assuming a 4k page table, but ppc64 all run on 64k page tables.
Reported-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a5b0e4062fb225155189e593699bbfcd0597f8b5 upstream.
Currently, dma_alloc_coherent is being called with a GFP_KERNEL
flag which allows it to sleep in an interrupt context, need to
change to GFP_ATOMIC.
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc1ffd6cb38a1c1af625b9833c41928039e733f5 upstream.
During code inspection, while investigating following stack trace
seen on one of the test setup, we found out there was possibility
of memory leak becuase driver was not unwinding the stack properly.
This issue has not been reproduced in a test environment or on a
customer setup.
Here's stack trace that was seen.
[1469877.797315] Call Trace:
[1469877.799940] [<ffffffffa03ab6e9>] qla2x00_mem_alloc+0xb09/0x10c0 [qla2xxx]
[1469877.806980] [<ffffffffa03ac50a>] qla2x00_probe_one+0x86a/0x1b50 [qla2xxx]
[1469877.814013] [<ffffffff813b6d01>] ? __pm_runtime_resume+0x51/0xa0
[1469877.820265] [<ffffffff8157c1f5>] ? _raw_spin_lock_irqsave+0x25/0x90
[1469877.826776] [<ffffffff8157cd2d>] ? _raw_spin_unlock_irqrestore+0x6d/0x80
[1469877.833720] [<ffffffff810741d1>] ? preempt_count_sub+0xb1/0x100
[1469877.839885] [<ffffffff8157cd0c>] ? _raw_spin_unlock_irqrestore+0x4c/0x80
[1469877.846830] [<ffffffff81319b9c>] local_pci_probe+0x4c/0xb0
[1469877.852562] [<ffffffff810741d1>] ? preempt_count_sub+0xb1/0x100
[1469877.858727] [<ffffffff81319c89>] pci_call_probe+0x89/0xb0
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[ bvanassche: Fixed spelling in patch description ]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c9d8d0c41b3e24473ac7648a7fc2d644ccf08ff upstream.
If srp_transfer_data fails within ibmvscsis_write_pending, then
the most likely scenario is that the client timed out the op and
removed the TCE mapping. Thus it will loop forever retrying the
op that is pretty much guaranteed to fail forever. A better return
code would be EIO instead of EAGAIN.
Reported-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Bryant G. Ly <bgly@us.ibm.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit af15769ffab13d777e55fdef09d0762bf0c249c4 upstream.
gcc-7 notices that the condition in mvs_94xx_command_active looks
suspicious:
drivers/scsi/mvsas/mv_94xx.c: In function 'mvs_94xx_command_active':
drivers/scsi/mvsas/mv_94xx.c:671:15: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]
This was introduced when the mv_printk() statement got added, and leads
to the condition being ignored. This is probably harmless.
Changing '&&' to '&' makes the code look reasonable, as we check the
command bit before setting and printing it.
Fixes: a4632aae8b66 ("[SCSI] mvsas: Add new macros and functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7b93ca43b7e21fbe6fb1a6f4ecce4a2f70f424a0 upstream.
When a SW-configurable card is specified but not found, the driver
releases wrong region, causing the following message in kernel log:
Trying to free nonexistent resource <0000000000000000-000000000000000f>
Fix it by assigning base earlier.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Fixes: a8cfbcaec0c1 ("scsi: g_NCR5380: Stop using scsi_module.c")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 128394eff343fc6d2f32172f03e24829539c5835 upstream.
Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those. Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ae2aae2421983f6f68eb7c4692624bc43ea50712 upstream.
Controllers with this PCI ID never shipped outside of
PMCS/Microsemi. Remove the ID from the aacraid driver. smartpqi is the
correct driver for these controllers.
[mkp: patch description]
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d2a145252c52792bc59e4767b486b26c430af4bb upstream.
A race between scanning and fc_remote_port_delete() may result in a
permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
and unblocked after. The reason is that blocking a device sets both the
SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED. However,
scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
device to be ignored by scsi_target_unblock() and thus never have its
QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
running but has a stopped queue.
We actually have two places where SDEV_RUNNING is set: once in
scsi_add_lun() which respects the blocked flag and once in
scsi_sysfs_add_sdev() which doesn't. Since the second set is entirely
spurious, simply remove it to fix the problem.
Reported-by: Zengxi Chen <chenzengxi@huawei.com>
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
does not support JBOD sequence map
commit d5573584429254a14708cf8375c47092b5edaf2c upstream.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
30secs before reset
commit 18e1c7f68a5814442abad849abe6eacbf02ffd7c upstream.
For SRIOV enabled firmware, if there is a OCR(online controller reset)
possibility driver set the convert flag to 1, which is not happening if
there are outstanding commands even after 180 seconds. As driver does
not set convert flag to 1 and still making the OCR to run, VF(Virtual
function) driver is directly writing on to the register instead of
waiting for 30 seconds. Setting convert flag to 1 will cause VF driver
will wait for 30 secs before going for reset.
Signed-off-by: Kiran Kumar Kasturi <kiran-kumar.kasturi@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One small fix for a regression in a prior fix (again).
This time the condition in the prior fix BUG_ON proved to be wrong
under certain circumstances causing a BUG to trigger where it
shouldn't in the lpfc driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: fix oops/BUG in lpfc_sli_ringtxcmpl_put()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes.
The be2iscsi is a potential device overrun in consistent memory, which
could have nasty consequences if the consistent allocations are
packed.
The hpsa one fixes a regression where older controllers can now get a
numbering clash between the first internal disk and the controller.
The libfc one is a regression in timespec conversions which causes a
user visible issue in a command line tool and the mpt3sas one fixes a
regression where the controller could remain permanently blocked after
an ATA pass through command followed by a reset"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: be2iscsi: allocate enough memory in beiscsi_boot_get_sinfo()
scsi: mpt3sas: Unblock device after controller reset
scsi: hpsa: use bus '3' for legacy HBA devices
scsi: libfc: fix seconds_since_last_reset miscalculation
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"The recent changes in ahci MSI handling need one more fix. Hopefully,
this restores parity with before.
The other two are minor fixes with both low impact and risk"
* 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: always fall back to single-MSI mode
libata-scsi: Fixup ata_gen_passthru_sense()
mvsas: fix error return code in mvs_task_prep()
|
|
Pull sparc fixes from David Miller:
"Two ugly build warning fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
dbri: Fix compiler warning
qlogicpti: Fix compiler warnings
|
|
qlogicpti uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enabled 64bit
DMA and therefore dma_addr_t became of type u64. This makes
'incompatible pointer type' warnings inevitable.
e.g.
drivers/scsi/qlogicpti.c: In function ‘qpti_map_queues’:
drivers/scsi/qlogicpti.c:813: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’
drivers/scsi/qlogicpti.c:822: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’
For the record, qlogicpti never executes on sun4v. Therefore even
though 64bit DMA is enabled on SPARC, qlogicpti continues to use
legacy iommu that guarantees DMA address is always in 32bit range.
This patch resolves aforementioned compiler warnings.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: thomas tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The BUG_ON() recently introduced in lpfc_sli_ringtxcmpl_put() is hit in
the lpfc_els_abort() > lpfc_sli_issue_abort_iotag() >
lpfc_sli_abort_iotag_issue() function path [similar names], due to
'piocb->vport == NULL':
BUG_ON(!piocb || !piocb->vport);
This happens because lpfc_sli_abort_iotag_issue() doesn't set the
'abtsiocbp->vport' pointer -- but this is not the problem.
Previously, lpfc_sli_ringtxcmpl_put() accessed 'piocb->vport' only if
'piocb->iocb.ulpCommand' is neither CMD_ABORT_XRI_CN nor
CMD_CLOSE_XRI_CN, which are the only possible values for
lpfc_sli_abort_iotag_issue():
lpfc_sli_ringtxcmpl_put():
if ((unlikely(pring->ringno == LPFC_ELS_RING)) &&
(piocb->iocb.ulpCommand != CMD_ABORT_XRI_CN) &&
(piocb->iocb.ulpCommand != CMD_CLOSE_XRI_CN) &&
(!(piocb->vport->load_flag & FC_UNLOADING)))
lpfc_sli_abort_iotag_issue():
if (phba->link_state >= LPFC_LINK_UP)
iabt->ulpCommand = CMD_ABORT_XRI_CN;
else
iabt->ulpCommand = CMD_CLOSE_XRI_CN;
So, this function path would not have hit this possible NULL pointer
dereference before.
In order to fix this regression, move the second part of the BUG_ON()
check prior to the pointer dereference that it does check for.
For reference, this is the stack trace observed. The problem happened
because an unsolicited event was received - a PLOGI was received after
our PLOGI was issued but not yet complete, so the discovery state
machine goes on to sw-abort our PLOGI.
kernel BUG at drivers/scsi/lpfc/lpfc_sli.c:1326!
Oops: Exception in kernel mode, sig: 5 [#1]
<...>
NIP [...] lpfc_sli_ringtxcmpl_put+0x1c/0xf0 [lpfc]
LR [...] __lpfc_sli_issue_iocb_s4+0x188/0x200 [lpfc]
Call Trace:
[...] [...] __lpfc_sli_issue_iocb_s4+0xb0/0x200 [lpfc] (unreliable)
[...] [...] lpfc_sli_issue_abort_iotag+0x2b4/0x350 [lpfc]
[...] [...] lpfc_els_abort+0x1a8/0x4a0 [lpfc]
[...] [...] lpfc_rcv_plogi+0x6d4/0x700 [lpfc]
[...] [...] lpfc_rcv_plogi_plogi_issue+0xd8/0x1d0 [lpfc]
[...] [...] lpfc_disc_state_machine+0xc0/0x2b0 [lpfc]
[...] [...] lpfc_els_unsol_buffer+0xcc0/0x26c0 [lpfc]
[...] [...] lpfc_els_unsol_event+0xa8/0x220 [lpfc]
[...] [...] lpfc_complete_unsol_iocb+0xb8/0x138 [lpfc]
[...] [...] lpfc_sli4_handle_received_buffer+0x6a0/0xec0 [lpfc]
[...] [...] lpfc_sli_handle_slow_ring_event_s4+0x1c4/0x240 [lpfc]
[...] [...] lpfc_sli_handle_slow_ring_event+0x24/0x40 [lpfc]
[...] [...] lpfc_do_work+0xd88/0x1970 [lpfc]
[...] [...] kthread+0x108/0x130
[...] [...] ret_from_kernel_thread+0x5c/0xbc
<...>
Cc: stable@vger.kernel.org # v4.8
Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
|
|
We accidentally allocate sizeof(u32) instead of sizeof(struct
be_cmd_get_session_resp).
Fixes: 50a4b824be9e ("scsi: be2iscsi: Fix to make boot discovery non-blocking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.
Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.
[mkp: clarified patch description]
Cc: <stable@vger.kernel.org> # v4.4+
Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset")
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Older controllers use SCSI target id '0' for the first internal disk. As
the controllers are now placed on the same bus as the internal disks
this leads to a clash with the SCSI target id of controller. This patch
checks the SCSI revision, and moves older controller to bus '3' to be
compatible with older releases and avoid this problem.
[mkp: fixed uninitialized variable]
Fixes: 09371d623c9 ("hpsa: Change SAS transport devices to bus 0.")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes.
One prevents timeouts on mpt3sas when trying to use the secure erase
protocol which causes the erase protocol to be aborted. The second is
a regression in a prior fix which causes all commands to abort during
PCI extended error recovery, which is incorrect because PCI EEH is
independent from what's happening on the FC transport"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery
scsi: mpt3sas: Fix secure erase premature termination
|
|
Commit 540eb1eef0ab ("scsi: libfc: fix seconds_since_last_reset calculation")
removed the use of 'struct timespec' from fc_get_host_stats(). This broke the
output of 'fcoeadm -s' after kernel 4.8-rc1.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 540eb1eef0ab ("scsi: libfc: fix seconds_since_last_reset calculation")
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
|
|
The previous commit 1535aa75a3d8 ("qla2xxx: fix invalid DMA access after
command aborts in PCI device remove") introduced a regression during an
EEH recovery, since the change to the qla2x00_abort_all_cmds() function
calls qla2xxx_eh_abort(), which verifies the EEH recovery condition but
handles it heavy-handed. (commit a465537ad1a4 "qla2xxx: Disable the
adapter and skip error recovery in case of register disconnect.")
This problem warrants a more general/optimistic solution right into
qla2xxx_eh_abort() (eg in case a real command abort arrives during EEH
recovery, or if it takes long enough to trigger command aborts); but
it's still worth to add a check to ensure the code added by the previous
commit is correct and contained within its owner function.
This commit just adds a 'if (!ha->flags.eeh_busy)' check around it.
(ahem; a trivial fix for this -rc series; sorry for this oversight.)
With it applied, both PCI device remove and EEH recovery works fine.
Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The megaraid_sas patch in here fixes a major regression in the last
fix set that made all megaraid_sas cards unusable. It turns out no-one
had actually tested such an "obvious" fix, sigh. The fix for the fix
has been tested ...
The next most serious is the vmw_pvscsi abort problem which basically
means that aborts don't work on the vmware paravirt devices and error
handling always escalates to reset.
The rest are an assortment of missed reference counting in certain
paths and corner case bugs that show up on some architectures"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove
scsi: qla2xxx: do not queue commands when unloading
scsi: libcxgbi: fix incorrect DDP resource cleanup
scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
scsi: scsi_dh_alua: Fix a reference counting bug
scsi: vmw_pvscsi: return SUCCESS for successful command aborts
scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
scsi: scsi_dh_alua: fix missing kref_put() in alua_rtpg_work()
|
|
This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific
command which timed out, this leads to entire device halt
(scsi_state terminated) and premature termination of the secure erase.
Set device state to busy while ATA passthrough commands are in progress.
[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]
Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
|
|
This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").
The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).
[mkp: clarified patch description]
Fixes: 1e793f6fc0db920400574211c48f9157a37e3945
Reported-by: Jens Axboe <axboe@kernel.dk>
CC: stable@vger.kernel.org
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Tested-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If a command is aborted in the kernel but not in the adapter, it might be
considered complete and its DMA memory released, but it is still alive in
the adapter, which will trigger an invalid DMA access upon its completion
(in the DMA operations to deliver the command response to the driver).
On powerpc platforms with IOMMU/EEH capabilities, the problem is observed
during PCI device removal with ongoing IO requests -- which might trigger
an EEH event very often, pointing to a 'TCE Request Page Access Error'.
In that path, which is qla2x00_remove_one(), the commands are aborted in
qla2x00_abort_all_cmds(), which does not perform an abort in the adapter
as is done in qla2xxx_eh_abort() for example.
So, this patch changes qla2x00_abort_all_cmds() to abort commands in the
adapter too, with a call to qla2xxx_eh_abort(), which already implements
all the logic to submit abort requests and handle responses.
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When the driver is unloading, in qla2x00_remove_one(), there is a single
call/point in time to abort ongoing commands, qla2x00_abort_all_cmds(),
which is still several steps away from the call to scsi_remove_host().
If more commands continue to arrive and be processed during that
interval, when the driver is tearing down and releasing its structures,
it might potentially hit an oops due to invalid memory access:
Unable to handle kernel paging request for data at address 0x00000138
<...>
NIP [d000000004700a40] qla2xxx_queuecommand+0x80/0x3f0 [qla2xxx]
LR [d000000004700a10] qla2xxx_queuecommand+0x50/0x3f0 [qla2xxx]
So, fail commands in qla2xxx_queuecommand() if the UNLOADING bit is set.
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Before calling task_release_itt() task data is memset to zero because of
which DDP context information is lost resulting in incorrect DDP
resource cleanup, to fix this call task_release_itt() before memset.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two more important data integrity fixes related to RAID device drivers
which wrongly throw away the SYNCHRONIZE CACHE command in the non-RAID
path and a memory leak in the scsi_debug driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
|
|
A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.
In a nutshell, since commit beb9e315e6e0 ("qla2xxx: Prevent removal and
board_disable race"):
...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:
if (!atomic_read(&pdev->enable_cnt)) {
scsi_host_put(base_vha->host);
kfree(ha);
pci_set_drvdata(pdev, NULL);
return;
Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good. However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:
if (time > vha->hw->loop_reset_delay * HZ)
return 1;
The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.
This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.
The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point). If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.
This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.
Fixes: beb9e315e6e0 ("qla2xxx: Prevent removal and board_disable race")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The code at the end of alua_rtpg_work() is as follows:
scsi_device_put(sdev);
kref_put(&pg->kref, release_port_group);
In other words, alua_rtpg_queue() must hold an sdev reference and a pg
reference before queueing rtpg work. If no rtpg work is queued no
additional references should be held when alua_rtpg_queue() returns. If
no rtpg work is queued, ensure that alua_rtpg_queue() only gives up the
sdev reference if that reference was obtained by the same
alua_rtpg_queue() call.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reported-by: Tang Junhui <tang.junhui@zte.com.cn>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Cc: <stable@vger.kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The vmw_pvscsi driver reports most successful aborts as FAILED to the
scsi error handler. This is do to a misunderstanding of how
completion_done() works and its interaction with a successful wait using
wait_for_completion_timeout(). The vmw_pvscsi driver is expecting
completion_done() to always return true if complete() has been called on
the completion structure. But completion_done() returns true after
complete() has been called only if no function like
wait_for_completion_timeout() has seen the completion and cleared it as
part of successfully waiting for the completion.
Instead of using completion_done(), vmw_pvscsi should just use the
return value from wait_for_completion_timeout() to know if the wait
timed out or not.
[mkp: bumped driver version per request]
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Jim Gill <jgill@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While merging mpt3sas & mpt2sas code, we added the is_warpdrive check
condition on the wrong line
---------------------------------------------------------------------------
scsih_target_alloc(struct scsi_target *starget)
sas_target_priv_data->handle = raid_device->handle;
sas_target_priv_data->sas_address = raid_device->wwid;
sas_target_priv_data->flags |= MPT_TARGET_FLAGS_VOLUME;
- raid_device->starget = starget;
+ sas_target_priv_data->raid_device = raid_device;
+ if (ioc->is_warpdrive)
+ raid_device->starget = starget;
}
spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
return 0;
------------------------------------------------------------------------------
That check should be for the line sas_target_priv_data->raid_device =
raid_device;
Due to above hunk, we are not initializing raid_device's starget for
raid volumes, and so during raid disk deletion driver is not calling
scsi_remove_target() API as driver observes starget field of
raid_device's structure as NULL.
Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <stable@vger.kernel.org> # v4.4+
Fixes: 7786ab6aff9 ("mpt3sas: Ported WarpDrive product SSS6200 support")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Reference count of pg leaks in alua_rtpg_work() since kref_put() is not
called to decrease the reference count of pg when the condition
pg->rtpg_sdev==NULL satisfied (actually it is easy to satisfy), it would
cause memory of pg leakage.
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Cc: <stable@vger.kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix to return error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes: one is a fatal section mismatch (reference to init
after it's discarded) and the other two are iscsi locking fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: NCR5380: no longer mark irq probing as __init
scsi: be2iscsi: Replace _bh with _irqsave/irqrestore
scsi: libiscsi: Fix locking in __iscsi_conn_send_pdu
|
|
|
|
The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.
Ensure that cache flushes are passed to the controller.
[mkp: applied by hand and removed unused vars]
Cc: <stable@vger.kernel.org>
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
map_storep was not being vfree()'d in the module_exit call.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit 02b01e010afe ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.
[mkp: clarified patch description]
Fixes: 02b01e010afeeb49328d35650d70721d2ca3fd59
CC: stable@vger.kernel.org
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five small fixes.
Some of these, like the nested spinlock overwriting saved flags and
the Kasan use after free look serious, but they seem not to have been
picked up in testing or seen in the field.
The biggest user visible issue is probably the wrong device handler
for Clariion, which means that alua doesn't bind to the array like it
should"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ipr: Fix async error WARN_ON
scsi: zfcp: spin_lock_irqsave() is not nestable
scsi: Remove one useless stack variable
scsi: Fix use-after-free
scsi: Replace wrong device handler name for CLARiiON arrays
|
|
|
|
Merge the gup_flags cleanups from Lorenzo Stoakes:
"This patch series adjusts functions in the get_user_pages* family such
that desired FOLL_* flags are passed as an argument rather than
implied by flags.
The purpose of this change is to make the use of FOLL_FORCE explicit
so it is easier to grep for and clearer to callers that this flag is
being used. The use of FOLL_FORCE is an issue as it overrides missing
VM_READ/VM_WRITE flags for the VMA whose pages we are reading
from/writing to, which can result in surprising behaviour.
The patch series came out of the discussion around commit 38e088546522
("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
which addressed a BUG_ON() being triggered when a page was faulted in
with PROT_NONE set but having been overridden by FOLL_FORCE.
do_numa_page() was run on the assumption the page _must_ be one marked
for NUMA node migration as an actual PROT_NONE page would have been
dealt with prior to this code path, however FOLL_FORCE introduced a
situation where this assumption did not hold.
See
https://marc.info/?l=linux-mm&m=147585445805166
for the patch proposal"
Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.
[ This branch was rebased recently to add a few more acked-by's and
reviewed-by's ]
* gup_flag-cleanups:
mm: replace access_process_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace get_user_pages_remote() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: remove write/force parameters from __get_user_pages_unlocked()
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
|