Despite its benefits, causal consistency is not trivial to guarantee: one has to keep track of causal dependencies, and to subsequently ensure that operations are delivered in causal order. Interestingly, the granularity at which causal dependencies are tracked impacts significantly the system's performance. When precisely tracking causal dependencies, the costs associated with the processing and transferring of meta- data have a significant impact in throughput. It is possible to mitigate this impact by compressing metadata to reduce the amount of metadata handled. Nevertheless, this comes with the cost of losing precision, which penalizes remote visibility latencies - the delay before an operation's effect is observable at remote replicas, due to the creation of false causal dependencies - two concurrent operations which are ordered as an artifact of the metadata management. This tension between throughput and remote visibility latency is inherent to previous work, and it is typically exacerbated when one wants to support partial replication.
This thesis proposes a set of techniques, which combined, alleviate this tension, allowing designers of causally consistent geo-replicated systems to optimize both throughput and remote visibility latency simultaneously, and attain genuine partial replication - a key property to ensure scalability when the number of geo-locations increases. The key technique is a novel metadata dissemination service, which relies on a set of metadata brokers, organized in a tree topology. This thesis experimentally demonstrates that, when the topology is well configured, this mechanism allows to implement genuine partial replication and optimize remote visibility latency while keeping the size of the metadata small and constant, crucial to avoid impairing throughput. Furthermore, this service can be decoupled from the service responsible for managing the data, promoting modular architectures for geo-replicated systems.
The metadata dissemination service assumes that each datacenter is able to serialize, in an order con- sistent with causality, all updates issued locally. This thesis shows how it is possible to efficiently achieve this by integrating services that operate out of the clients' critical operational path.
We have built a prototype, namely SATURN, that integrates all the aforementioned techniques. SATURN is designed as a metadata service that can be used in combination with several replicated data services. We evaluate SATURN in Amazon EC2 using realistic benchmarks under both full and partial geo-replication. Results show that weakly consistent datastores can lean on SATURN to upgrade their consistency guarantees to causal consistency with a negligible penalty on performance: with only 2% reduction in throughput and 11.7ms of extra remote visibility latency in geo-replicated settings. Also, our extensive evaluation shows that our techniques compare favorably to previous state-of-the-art solutions: SATURN exhibits significant improvements in terms of throughput (38.3%) compared to solutions that favor remote visibility latency; while exhibiting significantly lower remote visibility latency (76.9ms less on average) compared to solutions that favor high throughput.