Scalable Algorithms for Global Snapshots in Distributed Systems : Extended Version

Existing algorithms for global snapshots in distributed systems are not scalable when the underlying topology is complete. In a network with N processors, these algorithms require O(N) space and O(N) messages per processor. As a result, these algorithms are not efficient in large systems when the logical topology of the communication layer such as MPI is complete.

In this paper, we propose three algorithms for global snapshot: a grid-based, a tree-based and a centralized algorithm. The grid-based algorithm uses O(N) space but only O(N1/2) messages per processor. The tree-based algorithm requires only O(1) space and O(log N log w) messages per processor where w is the average number of messages in transit per processor. The centralized algorithm requires only O(1) space and O( log w) messages per processor. We also show a matching lower bound for this problem.

Our algorithms have applications in checkpointing, detecting stable predicates and implementing synchronizers. We have implemented our algorithms on top of the MPI library on the Blue Gene/L supercomputer. Our experiments confirm that the proposed algorithms significantly reduce the message and space complexity of a global snapshot.

By: Rahul Garg, Vijay K. Garg, Yogish Sabharwal

Published in: RI06003 in 2006


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .