summaryrefslogtreecommitdiff
path: root/drivers/scsi/cxlflash/superpipe.c
AgeCommit message (Collapse)Author
2016-05-06cxlflash: Fix to resolve dead-lock during EEH recoveryManoj N. Kumar
When a cxlflash adapter goes into EEH recovery and multiple processes (each having established its own context) are active, the EEH recovery can hang if the processes attempt to recover in parallel. The symptom logged after a couple of minutes is: INFO: task eehd:48 blocked for more than 120 seconds. Not tainted 4.5.0-491-26f710d+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. eehd 0 48 2 Call Trace: __switch_to+0x2f0/0x410 __schedule+0x300/0x980 schedule+0x48/0xc0 rwsem_down_write_failed+0x294/0x410 down_write+0x88/0xb0 cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash] cxl_vphb_error_detected+0x88/0x110 [cxl] cxl_pci_error_detected+0xb0/0x1d0 [cxl] eeh_report_error+0xbc/0x130 eeh_pe_dev_traverse+0x94/0x160 eeh_handle_normal_event+0x17c/0x450 eeh_handle_event+0x184/0x370 eeh_event_handler+0x1c8/0x1d0 kthread+0x110/0x130 ret_from_kernel_thread+0x5c/0xa4 INFO: task blockio:33215 blocked for more than 120 seconds. Not tainted 4.5.0-491-26f710d+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. blockio 0 33215 33213 Call Trace: 0x1 (unreliable) __switch_to+0x2f0/0x410 __schedule+0x300/0x980 schedule+0x48/0xc0 rwsem_down_read_failed+0x124/0x1d0 down_read+0x68/0x80 cxlflash_ioctl+0x70/0x6f0 [cxlflash] scsi_ioctl+0x3b0/0x4c0 sg_ioctl+0x960/0x1010 do_vfs_ioctl+0xd8/0x8c0 SyS_ioctl+0xd4/0xf0 system_call+0x38/0xb4 INFO: task eehd:48 blocked for more than 120 seconds. The hang is because of a 3 way dead-lock: Process A holds the recovery mutex, and waits for eehd to complete. Process B holds the semaphore and waits for the recovery mutex. eehd waits for semaphore. The fix is to have Process B above release the semaphore before attempting to acquire the recovery mutex. This will allow eehd to proceed to completion. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-09cxlflash: Reorder user context initializationUma Krishnan
In order to support cxlflash in the PowerVM environment, underlying hypervisor APIs have imposed a kernel API ordering change. For the superpipe access to LUN, user applications need a context. The cxlflash module creates this context by making a sequence of cxl calls. In the current code, a context is initialized via cxl_dev_context_init() followed by cxl_process_element(), a function that obtains the process element id. Finally, cxl_start_work() is called to attach the process element. In the PowerVM environment, a process element id cannot be obtained from the hypervisor until the process element is attached. The cxlflash module is unable to create contexts without a valid process element id. To fix this problem, cxl_start_work() is called before obtaining the process element id. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-09cxlflash: Simplify attach path error cleanupMatthew R. Ochs
The cxlflash_disk_attach() routine currently uses a cascading error gate strategy for its error cleanup path. While this strategy is commonly used to handle cleanup scenarios, it is too restrictive when function callouts need to be restructured. Problems range from inserting error path bugs in previously 'good' code to the cleanup path imposing design changes to how the normal path is structured. A less restrictive approach is needed to support ordering changes that come about when operating in different environments. To overcome this restriction, the error cleanup path is modified to have a single entrypoint and use conditional logic to cleanup where necessary. Entities that require multiple cleanup steps must be carefully vetted to ensure their APIs support state. In cases where they do not (none as of this commit) additional local variables can be used to maintain state on their behalf. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-09cxlflash: Split out context initializationMatthew R. Ochs
Presently, context information structures are allocated and initialized in the same routine, create_context(). This imposes an ordering restriction such that all pieces of information needed to initialize a context must be known before the context is even allocated. This design point is not flexible when the order of context creation needs to be modified. Specifically, this can lead to problems when members of the context information structure are a part of an ordering dependency (i.e. - the 'work' structure embedded within the context). To remedy, the allocation is left as-is, inside of the existing create_context() routine and the initialization is transitioned to a new void routine, init_context(). At the same time, in anticipation of these routines not being called in sequence, a state boolean is added to the context information structure to track when the context has been initilized. The context teardown routine, destroy_context(), is modified to support being called with a non-initialized context. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-12-10cxlflash: drop unlikely before IS_ERR_OR_NULLGeliang Tang
IS_ERR_OR_NULL already contain an unlikely compiler flag. Drop it. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-12-10cxlflash: a couple off by one bugsDan Carpenter
The "> MAX_CONTEXT" should be ">= MAX_CONTEXT". Otherwise we go one step beyond the end of the cfg->ctx_tbl[] array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-10-30cxlflash: Fix to avoid bypassing context cleanupMatthew R. Ochs
Contexts may be skipped over for cleanup in situations where contention for the adapter's table-list mutex is experienced in the presence of a signal during the execution of the release handler. This can lead to two known issues: - A hang condition on remove as that path tries to wait for users to cleanup - something that will never complete should this scenario play out as the user has already cleaned up from their perspective. - An Oops in the unmap_mapping_range() call that is made as part of the user waiting mechanism that is invoked on remove when contexts are found to still exist. The root cause of this issue can be found in get_context() and how the table-list mutex is acquired. As this code path is shared by several different access points within the driver, a decision was made during the development cycle to acquire this mutex in this location using the interruptible version of the mutex locking service. In almost all of the use-cases and environmental scenarios this holds up, even when the mutex is contended. However, for critical system threads (such as the release handler), failing to acquire the mutex and bailing with the intention of the user being able to try again later is unacceptable. In such a scenario, the context _must_ be derived as it is on an irreversible path to being freed. Without being able to derive the context, the code mistakenly assumes that it has already been freed and proceeds to free up the underlying CXL context resources. From this point on, any usage of [the now stale] CXL context resources will result in undefined behavior. This is root cause of the Oops mentioned as the second known issue as the mapping passed to the unmap_mapping_range() service is owned by the CXL context. To fix this problem, acquisition of the table-list mutex within get_context() is simply changed to use the uninterruptible version of the mutex locking service. This is safe as the timing windows for holding this mutex are short and also protected against blocking. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix to avoid potential deadlock on EEHMatthew R. Ochs
Ioctl threads that use scsi_execute() can run for an excessive amount of time due to the fact that they have lengthy timeouts and retry logic built in. Under normal operation this is not an issue. However, once EEH enters the picture, a long execution time coupled with the possibility that a timeout can trigger entry to the driver via registered reset callbacks becomes a liability. In particular, a deadlock can occur when an EEH event is encountered while in running in scsi_execute(). As part of the recovery, the EEH handler drains all currently running ioctls, waiting until they have completed before proceeding with a reset. As the scsi_execute()'s are situated on the ioctl path, the EEH handler will wait until they (and the remainder of the ioctl handler they're associated with) have completed. Normally this would not be much of an issue aside from the longer recovery period. Unfortunately, the scsi_execute() triggers a reset when it times out. The reset handler will see that the device is already being reset and wait until that reset completed. This creates a condition where the EEH handler becomes stuck, infinitely waiting for the ioctl thread to complete. To avoid this behavior, temporarily unmark the scsi_execute() threads as an ioctl thread by releasing the ioctl read semaphore. This allows the EEH handler to proceed with a recovery while the thread is still running. Once the scsi_execute() returns, the ioctl read semaphore is reacquired and the adapter state is rechecked in case it changed while inside of scsi_execute(). The state check will wait if the adapter is still being recovered or returns a failure if the recovery failed. In the event that the adapter reset failed, the failure is simply returned as the ioctl would be unable to continue. Reported-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix to avoid corrupting adapter fopsMatthew R. Ochs
The fops owned by the adapter can be corrupted in certain scenarios, opening a window where certain fops are temporarily NULLed before being reset to their proper value. This can potentially lead software to make incorrect decisions, leaving the user with the inability to function as intended. An example of this behavior can be observed when there are a number of users with a high rate of turn around (attach to LUN, perform an I/O, detach from LUN, repeat). Every so often a user is given a valid context and adapter file descriptor, but the file associated with the descriptor lacks the correct read permission bit (FMODE_CAN_READ) and thus the read system call bails before calling the valid read fop. Background: The fops is stored in the adapter structure to provide the ability to lookup the adapter structure from within the fop handler. CXL services use the file's private_data and at present, the CXL context does not have a private section. In an effort to limit areas of the cxlflash driver with code specific the superpipe function, a design choice was made to keep the details of the fops situated away from the legacy portions of the driver. This drove the behavior that the adapter fops is set at the beginning of the disk attach ioctl handler when there are no users present. The corruption that this fix remedies is due to the fact that the fops is initially defaulted to values found within a static structure. When the fops is handed down to the CXL services later in the attach path, certain services are patched. The fops structure remains correct until the user count drops to 0 and the fops is reset, triggering the process to repeat again. The user counts are tightly coupled with the creation and deletion of the user context. If multiple users perform a disk attach at the same time, when the user count is currently 0, some users can be in the middle of obtaining a file descriptor and have not yet reached the context creation code that [in addition to creating the context] increments the user count. Subsequent users coming in to perform the attach see that the user count is still 0, and reinitialize the fops, temporarily removing the patched fops. The users that are in the middle obtaining their file descriptor may then receive an invalid descriptor. The fix simply removes the user count altogether and moves the fops initialization to probe time such that it is only performed one time for the life of the adapter. In the future, if the CXL services adopt a private member for their context, that could be used to store the adapter structure reference and cxlflash could revert to a model that does not require an embedded fops. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Correct spelling, grammar, and alignment mistakesMatthew R. Ochs
There are several spelling and grammar mistakes throughout the driver. Additionally there are a handful of places where there are extra lines and unnecessary variables/statements. These are a nuisance and pollute the driver. Fix spelling and grammar issues. Update some comments for clarity and consistency. Remove extra lines and a few unneeded variables/statements. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix to prevent EEH recovery failureMatthew R. Ochs
The process_sense() routine can perform a read capacity which can take some time to complete. If an EEH occurs while waiting on the read capacity, the EEH handler will wait to obtain the context's mutex in order to put the context in an error state. The EEH handler will sit and wait until the context is free, but this wait can potentially last forever (deadlock) if the scsi_execute() that performs the read capacity experiences a timeout and calls into the reset callback. When that occurs, the reset callback sees that the device is already being reset and waits for the reset to complete. This leaves two threads waiting on the other. To address this issue, make the context unavailable to new, non-system owned threads and release the context while calling into process_sense(). After returning from process_sense() the context mutex is reacquired and the context is made available again. The context can be safely moved to the error state if needed during the unavailable window as no other threads will hold its reference. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix MMIO and endianness errorsMatthew R. Ochs
Sparse uncovered several errors with MMIO operations (accessing directly) and handling endianness. These can cause issues when running in different environments. Introduce __iomem and proper endianness tags/swaps where appropriate to make driver sparse clean. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Correct naming of limbo state and waitqMatthew R. Ochs
Limbo is not an accurate representation of this state and is also not consistent with the terminology that other drivers use to represent this concept. Rename the state and and its associated waitq to 'reset'. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix to avoid CXL services during EEHMatthew R. Ochs
During an EEH freeze event, certain CXL services should not be called until after the hardware reset has taken place. Doing so can result in unnecessary failures and possibly cause other ill effects by triggering hardware accesses. This translates to a requirement to quiesce all threads that may potentially use CXL runtime service during this window. In particular, multiple ioctls make use of the CXL services when acting on contexts on behalf of the user. Thus, it is essential to 'drain' running ioctls _before_ proceeding with handling the EEH freeze event. Create the ability to drain ioctls by wrapping the ioctl handler call in a read semaphore and then implementing a small routine that obtains the write semaphore, effectively creating a wait point for all currently executing ioctls. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix to avoid sizeof(bool)Matthew R. Ochs
Using sizeof(bool) is considered poor form for various reasons and sparse warns us of that. Correct by changing type from bool to u8. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix potential oops following LUN removalMatthew R. Ochs
When a LUN is removed, the sdev that is associated with the LUN remains intact until its reference count drops to 0. In order to prevent an sdev from being removed while a context is still associated with it, obtain an additional reference per-context for each LUN attached to the context. This resolves a potential Oops in the release handler when a dealing with a LUN that has already been removed. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Fix read capacity timeoutManoj Kumar
The timeout value for read capacity is too small. Certain devices may take longer to respond and thus the command may prematurely timeout. Additionally the literal used for the timeout is stale. Update the timeout to 30 seconds (matches the value used in sd.c) and rework the timeout literal to a more appropriate description. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-30cxlflash: Replace magic numbers with literalsManoj Kumar
Magic numbers are not meaningful and can create confusion. As a remedy, replace them with descriptive literals. Replace 512 with literal MAX_SECTOR_UNIT. Replace 5 with literal CMD_RETRIES. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-08-27cxlflash: Virtual LUN supportMatthew R. Ochs
Add support for physical LUN segmentation (virtual LUNs) to device driver supporting the IBM CXL Flash adapter. This patch allows user space applications to virtually segment a physical LUN into N virtual LUNs, taking advantage of the translation features provided by this adapter. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-08-27cxlflash: Superpipe supportMatthew R. Ochs
Add superpipe supporting infrastructure to device driver for the IBM CXL Flash adapter. This patch allows userspace applications to take advantage of the accelerated I/O features that this adapter provides and bypass the traditional filesystem stack. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>