The 1st ACM SIGOPS European Chapter Senior Workshop:

"Promoting excellence in European systems research"

July 12-13, 2005

Holiday Inn, Lisbon, Portugal

Tuesday 12th July

12:00 - 12:05	Welcome and opening remarks
12:05 - 13:05	Amazon.com: E-Commerce at Interplanetary Scale Werner Vogels VP & CTO - Amazon.com When, in the coming years, space research makes a break-through and space colonization becomes a reality, Amazon.com will be the preferred way to order your retail products anywhere in the universe. Amazon.com's global operation is the world's largest online retail operation, and it is growing to prepare itself for an even more universal presence. Already the current scale of its operations is unparalleled, and the software to run its distributed operation is so complex that no software vendor is able to deliver a robust and scalable solution. As a result, Amazon.com has become one of the premier high-tech companies developing globally scalable distributed systems. We have set out a course to incorporate technologies based on Chaos, Biology, and Evolution principles into the software to be able to guarantee robust operation at galactic scale. This presentation focuses on the evolution of the Amazon.com architecture and its scalability and robustness challenges. It will review a number of the principles that play a role in the development of the software at Amazon.com to create services that are able to self-organize and self-regenerate, and that can achieve service stability in a world that is continuously in flux.
13:05 - 14:15	Lunch
14:15 - 16:00	Replicating snapshot isolation based databases. Willy Zwaenepoel EPFL Snapshot isolation is now commonly used in databases such as Oracle, Postgres and newer versions of Microsoft SQLServer. The semantics of snapshot isolation is hard to implement efficiently in replicated databases. We define a slightly weaker semantics, called generalized snapshot isolation, that allows efficient replication but maintains most or all of the desirable aspects of the original definition. We have started work on a middleware implementation of replicated generalized snapshot isolation on top of replicas that locally implement snapshot isolation. One of the lessons learned so far from this implementation effort is that for performance reasons it appears desirable to implement durability in the replication middleware and not in the database, as is traditionally done. In the talk I will explain the reasons behind this observation, the benefits that can be extracted from it, and some of the difficulties of realizing these benefits with existing databases. Xen and the Art of Storage Virtualization Steven Hand Cambridge University The resurgence of interest in Virtual Machine Monitors and the facility to dynamically deploy and migrate individual virtual machine leads to new storage problems. In particular, VMs are not well served by existing NAS, SAN or cluster file systems which fail to scale, are overly complex, and/or are too expensive. In this talk I present our current designs and implementations for storage virtualization targeted at thousands or even millions of virtual machines, and point toward ongoing and future work. Cool Running: Dynamic Thermal Management in Operating Systems Frank Bellosa University of Erlangen With increasing clock speed and level of integration in today's processors, memories, and I/O-controllers, power dissipation is becoming a definitive concern of system design. Control-theoretic techniques have proven to manage the heat dissipation and temperature starting from the level of functional blocks within the processor up to the level of complete systems, so that a thermal emergency will never be reached. However application-, user- or service-specific requirements had to be neglected. This talk presents challenges and promising solutions for energy-aware system software to prevent the increasing power dissipation from becoming a show-stopper.
16:00 - 16:30	Afternoon tea break
16:30 - 18:15	Virtual Ring Routing: Overlay routing at the network layer Miguel Castro Microsoft Research Traditional routing protocols for wireless networks do not scale because they use floods to discover routes or to disseminate topology information. Recent work has proposed landmark and coordinate-based routing protocols that scale but require an infrastructure to map fixed identifiers to the current location of a node. Maintaining this infrastructure introduces overhead and complexity. We propose Virtual Ring Routing (VRR) a scalable routing protocol for wireless networks. Nodes in VRR have fixed location independent numeric identifiers. They are organized into a virtual ring such that the neighbors of the node in the ring are the nodes whose identifiers are closest to the left and to the right. VRR requires each node to maintain only a small number of paths to its immediate neighbors in the virtual ring and it does not require flooding to setup or maintain these paths. VRR uses these paths to route messages between any pair of nodes in the network without any route discovery overhead or delay but with a small increase in hop count. Our results show that this trade-off pays; VRR can route messages with lower end-to-end delays and higher delivery ratios than AODV, DSR, and DSDV. Additionally, VRR provides a distributed hash table interface that can be used to implement decentralized self-organizing services. PeerReview: Detecting Deviant Behavior in Distributed Systems Peter Druschel Max Planck Institute Distributed systems have complex failure modes and are prone to a variety of security attacks. In one class of attacks, a participating node that is controlled by an attacker deviates from the normal protocol in subtle ways, e.g.\ to exploit or disrupt the system, deny service to users or to censor content. While several methods have been proposed to detect particular attacks of this type, a general technique to detect deviant behavior remains elusive. In this talk, I'll sketch PeerReview, an auditing technique that combines secure logging, random auditing and execution replay to detect a wide range of deviant behavior in large-scale decentralized systems. If a majority of correct nodes share the same behavior, they can identify a malicious node simply by comparing its behavior to their own. We have demonstrated the feasibility of PeerReview by applying it to, and evaluating it in the context of, a decentralized and serverless email system that is based on a distributed hash table. Fast Topology Management in Large Overlay Networks Ozalp Babaoglu University of Bologna Topology is key to understanding robustness and function in many different contexts including social networks, biological systems, technological networks and P2P overlays. I will motiviate and define the "topology management problem" in overlay networks. I will then present a simple generic protocol, T-Man, for constructing and maintaining large classes of topologies in a fully decentralized manner. T-Man draws inspiration from "cell adhesion" that is used to explain pattern formation in biological development and regeneration. T-Man is self- organizing, scalable, robust and extremely fast. It can be used directly to satisfy application topology needs on-demand or it can be used to recover or bootstrap other protocols such as DHTs.
19:00 - 21:00	Workshop Banquet
21:00 - 22:00	SIGOPS European Chapter/EuroSys Business Meeting (FREE BAR)

Wednesday 13th July

08:00 - 09:00	Breakfast
09:00 - 10:45	The Transactional Manifesto Maurice Herlihy Microsoft Research and Brown University Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping ``multicore'' architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed. As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problems. Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (perhaps not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems. Flexible approaches to consistency of replicated data Marc Shapiro INRIA Rocquencourt and LIP6 Replication and consistency are essential features of any distributed system and have been studied extensively. Protocols differ greatly (despite all claiming to maintain consistency) but a systematic comparison is lacking. In practice, there seems to be no protocol that is both decentralised and maintains good semantic properties. To fill this need, we developed the Action-Constraint Framework to capture both the semantics of replicated data and the behaviour of a replication algorithm. It enables us to decompose the problem of ensuring consistency into simpler, easily understandable sub-problems. As the sub-problems are largely orthogonal, sub-solutions can be mixed and matched. Our unified framework enables a systematic exploration of both pessimistic and optimistic protocols, both full and partial replication, strong and weak consistency, etc. I will present some preliminary results, including a serialisation protocol with good decentralisation properties, and simulations comparing a number of representative sub-solution combinations. Finally, I will present the Joyce implementation of the framework, supporting cooperative applications. Joint work with Nishith Krishna and James O'Brien, performed at Microsoft Research Cambridge. Directions in System Engineering: combining DSLs, Aspects and Components. Gilles Muller École des Mines de Nantes Engineering systems software is known as one of the most difficult programming tasks. In fact, it raises numerous challenges that have as yet been only partially addressed. Some of these are the evolution of legacy systems, the safety of systems code, and the high degree of expertise required to develop low level services. Overall, this calls for dedicated methodologies and tools for capturing the specific needs and properties of systems software. In this talk, we first review several recent approaches that we have successfully applied such as Domain Specific Languages, Aspects and Components. We then describe the synergy between these approaches and conclude by presenting some of the new challenges they raise.
10:45 - 11:15	Morning coffee
11:15 - 12:30	Work-in-Progress Session 1 Rethinking OS support for high-speed networking Herbert Bos Sprint: adaptive data management for in-memory database clusters Fernando Pedone Demand for high performance and availability combined with plummeting hardware prices have led to the widespread emergence of large computing clusters. Most typically, cluster nodes are interconnected through very fast network switches and equipped with powerful processors and large main memories. These hardware trends invalidate many fundamental design decisions of current systems and require a re- evaluation of data structures and algorithms, adapted to the new environment. Moreover, from an operational perspective, configuring applications to such environments becomes too complex to be done manually. Sprint exploits the characteristics of modern computing clusters to achieve highly efficient and available adaptive data management. In this short talk I will overview some aspects of Sprint's architecture and some open problems we have identified. Nizza - Towards small, application-specific Trusted Computing Bases Hermann Haertig Automating DBMS Configuration: What if you could ask "what if"? Dushyanth Narayanan
12:30 - 14:00	Lunch
14:00 - 15:00	Work-in-Progress Session 2 On Detours and Shortcuts to solve distributed systems problems Paulo Esteves Veríssimo AutoPatch Christof Fetzer The AutoPatch project tries to increase the dependability (i. e., securiy, reliability, and availability) of software by automatically patching the code. The system will identify certain issues and generate patches to fix these issues, i.e., design failures, until manual patches become available. Thresher: A Filtering Archive for High-frequency Snapshots Liuba Shrira A snapshot system keeps past states of data so that read-only applications can run against consistent state - so called back-in-time execution. Decreasing storage costs and new efficient versioned storage techniques have enabled a new generation of snapshot systems that allow to capture virtually unlimited amounts of past states. Current snapshot systems suffer from the limitation that while it is easy to collect lots of past states it is difficult to distinguish in the storage system between important and unimportant states (that is to filter, for short). This makes it hard to retain important states for longer time-scale, or store them with faster access. We describes Thresher, the first high-performance non-disruptive snapshot system that provides the ability to filter out important states, that is, to disentangle important incremental updates from the others after the fact.
15:00 - 15:00	Workshop close

Workshop organizers:

General Chair: Antony Rowstron, Microsoft Research
Local Arrangements Chair: Paulo Ferreira, INESC