Fault-tolerant describes a computer system or component designed so that, in the event that a component fails, a backup component or procedure can immediately take its place with no loss of service. Fault tolerance can be provided with software, or embedded in hardware, or provided by some combination.
In the software implementation, the operating system provides an interface that allows a programmer to "checkpoint" critical data at pre-determined points within a transaction. In the hardware implementation (for example, with Stratus and its VOS operating system), the programmer does not need to be aware of the fault-tolerant capabilities of the machine.
At a hardware level, fault tolerance is achieved by duplexing each hardware component. Disks are mirrored. Multiple processors are "lock-stepped" together and their outputs are compared for correctness. When an anomaly occurs, the faulty component is determined and taken out of service, but the machine continues to function as usual.
Back to: Glossary