Replication Versus RAID for Distributed Storage Systems

In today’s large-scale distributed storage systems, vast amounts of user data are stored among a large number of nodes and disks. High availability and increased reliability require that data be stored in a redundant manner. We consider the two popular redundancy schemes: replication and erasure coding. In particular, we consider RAID-type distributed storage systems. New analytical models are developed to assess the system reliability in terms of the mean time to data loss, the storage efficiency, and the I/O throughput performance. Furthermore, we address the issue of placement of the redundant data in the nodes, and examine its effect. The models are then extended to analytically assess the impact of unrecoverable or latent media errors encountered on disk drives. We propose to use the intradisk redundancy scheme to cope with those type of errors and enhance the reliability of the storage systems. Our analytical results show that distributed RAID-5 systems enhanced by the intradisk redundancy scheme provide improved reliability compared with mirroring replication systems. They also require less storage space, but incur I/O performance degradation.

By: Ilias Iliadis, Robert Haas

Published in: RZ3733 in 2009


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .