Backward Error Recovery in Redundant Disk Arrays

William V. Courtright II and Garth A. Gibson

CMU CS Technical Report CMU-CS-94-193

Abstract

Redundant disk arrays are single fault tolerant, incorporating a layer of error handling not foundin non-redunant disk systems. Recovery from these errors is complex, due in part to the large number of erroneous states the system may reach. The established approach to error recovery in disk sysystems is to transition directly from an erroneous state to completion. This technique, known as forward error recovery, relies upon the context in which an error occurs to determine the steps required to reach completion, which implies forward error recovery is design specific. Forward error recovery requires the enumeration of all erroneous states the system may reach and the construction of a forward path from each erroneous state. We propose a method of error recovery which does not rely upon the enumeration of erroneous states or the context in which they occur. When an error is encountered, we advocate mechanized recovery to an error-free state from which an operation may be retried. Using a form a backward error recovery, we are able to manage the complexity of error recovery in redundant disk arrays without sacrificing performance.