Atomic Broadcast in Asynchronous Crash-Recovery Distributed

L. Rodrigues and M. Raynal

Selected sections of this report will be published in the Proceedings of the 20th IEEE International Conference on Distributed Computing Systems, pp. 288-295, Taipe, Taiwan, April, 2000.

Abstract

Atomic Broadcast is a fundamental problem of distributed systems: it states that messages must be delivered in the same order to their destination processes. This paper describes a solution to this problem in asynchronous distributed systems in which processes can crash and recover.

A Consensus-based solution to Atomic Broadcast problem has been designed by Chandra and Toueg for asynchronous distributed systems where crashed processes do not recover. Although our solution is based on different algorithmic principles, it follows the same approach: it transforms any Consensus protocol suited to the crash-recovery model into an Atomic Broadcast protocol suited to the same model. We show that Atomic Broadcast can be implemented without requiring any additional log operations in excess of those required by the Consensus. The paper also discusses how additional log operations can improve the protocol in terms of faster recovery and better throughput. Also available extended report (gzip postscript), (pdf) .


Luís Rodrigues