Thesis Proposals

Master Thesis proposals

Some preliminary considerations on the proposed thesis topics

  • The topics proposed in the following have a strong research-oriented component, and are focused in the area of Distributed systems, autonomic computing, capacity planning. They represent an excellent opportunity to gain experience in these challenging research fields, cooperating with one of the world-wide leading research groups in the area (the Distributed Systems Group) which publishes regularly in top scientific conferences and journals and is integrated in one of the best Portuguese research institutions, namely INESC-ID.
  • Each proposed topic explores highly innovative ideas. Thus, provided that they are appropriately studied, implemented and evaluated, they are likely to lead to one (or possibly more) scientific publications. This is not only an excellent additional item for the CV of any student, but also an essential factor to achieve a high final grade for the Thesis dissertation. In fact, the largest majority of the thesis that I have followed so far have resulted in at least one publication.
    On the other hand, the proposed thesis are challenging works that demand, on the student's side, commitment and actual will to challenge his own learning and rationale skills. So, if you are looking for an easy/sloppy topic, you may not probably want to continue reading what follows.
    On my side, you will be able to count on my full availability to accompany you in your work, and to provide you all the elements and support to fullfil the objectives of the thesis.

  • All the proposed themes have potential to serve as a starting point for a possible PhD thesis, provided of course that they are adequately developed.
  • The best way to pick the right thesis topic for you is to speak with the proponent advisors. If any of the topics below were to be of your interest, please contact me by email so that we can schedule a short meeting to get to know each other a little better.



Themes proposed for 2012/2013

Self-tuning data replication in large scale transactional data grids

Area

Distributed systems, data replication, autonomic computing

Context

This thesis will focus on the area of large scale transactional data platforms, such as Cassandra, Infinispan, Coherence.

In order to maximize scalability, these platforms rely on genuine partial replication mechanisms, which place a static bound on the number of copies of data in the system and rely on random hashing techniques to scatter uniformly the data across the nodes of the platform.

The downside of these approaches is that they fail to keep into account the data access locality of applications, which leads to a dramatic increase of the probability of incurring in expensive network communications to fetch data remotely from other nodes while processing.

Objectives

The objective of the thesis will be the design, development and evaluation of locality-aware data replication techniques that will self-tune the placement of replicas of data across the platform in order to maximize data locality and hence applications' performance.

The self-tuning mechanism will have to deal with three main challenges:

  • architecting lightweight/space-efficient mechanisms to spread across the platform the information concerning the mapping between data items and nodes.
  • ensuring the consistency of the transactional data accesses performed by applications even in presence of concurrent data relocations.
  • identifying the best candidates to maintain replicas of data taking into account also the inherent costs associated with replication.

Requirements

I strongly encourage potential candidates to arrange a short meeting to discuss the details of the proposal before applying. Simply send me an email to schedule a meeting.

Expected Results

  1. Java based prototype of the self-tuning data replication mechanisms, integrated with Infinispan (www.infinispan.org), one of mainstream open-source transactional data grids.
  2. Detailed performance evaluation study assessing its effectiveness and practical viability.

International collaborations

This thesis work will be carried out in the scope of the European project Cloud-TM, whose aim is to develop a self-optimizing middleware platform aimed at simplifying the development and administration of applications deployed on cloud computing infrastructures.

The Cloud-TM consortium is composed by international representatives of Academia (IST and CINI) and Industry (Red Hat, Algorithmica), thus giving the possibility to the student to come in contact with international experts and work on challenging and cutting-edge topics which are of interest for a very broad community.

The results of this thesis will be integrated with one of the mainstream open source transactional data grids, namely Infinispan (www.infinispan.org) by Red Hat, which is also a partner of Cloud-TM. The thesis will provide plenty of occasions to closely collaborate with the Infinspan developers' team and to contribute code to some core components of the Cloud-TM platform and/or of Infinispan.

Possibility of Scholarships

A scholarship will be provided by the Cloud-TM project to support this thesis work.


Elastic auto scaling of transactional data grids in cloud environments

Area

Distributed Systems, Cloud Computing, Capacity Planning

Context

Over the last years Cloud Computing has emerged as a disruptive paradigm for the future generation of IT services.

In the cloud, resources are then dispensed “elastically”, with a seemingly unbounded amount computational power and storage available on demand, in a pay-only-for-what-you-use pricing model. Just as the electric grid revolutionized access to electricity one hundred years ago, freeing corporations from having to generate their own power and enabling them to concentrate on their business differentiators, cloud computing is hailed as revolutionizing IT, freeing corporations from large IT capital investments and enabling them to plug into extremely powerful computing resources over the network.

The issue of data management in cloud computing environments is one of the hottest research areas of the moment, both in the academic and industrial communities.

This thesis will focus on the area of elastic transactional data grids, namely distributed transactional data platforms that are capable of dynamically adjusting their scale (number of nodes) to meet the characteristics of the incoming workload.

Objectives

The objective of this thesis is to build a "Transactional AutoScaler" (TAS), namely a module in charge of elastically scaling a transactional data grid on the basis of the actual workload demands.

TAS will consist of two main modules:

  1. The "performance predictor", which, given the current workload characterization, will forecast the performance of the platform when deployed over a different number of nodes; thanks to its predictive power, this module will be able to determine the *minimum* scale of the system capable of sustaining the current (or future) load, thus allowing to minimize the operational costs of the data grid.
  2. A "Reconfiguration manager", which will orchestrate the actual reconfiguration of the data grid. This module will not only automate the acquisition/release of nodes at runtime from the underlying private/public cloud. It will also enforce the synchronization among the new set of replicas, to guarantee data consistency in presence of dynamic reconfigurations of the platform.

Methodologies that will be employed/learnt during the the thesis

The performance forecasting models will be based both on analytical methods, e.g. queuing theory or stochastic modeling techniques, as well as on machine learning tools, e.g. neural networks, decision trees, Q-learning.

The student is not expected to have background in the above areas, and will be assisted in the learning of their theoretical foundations and of tools that exploit them.

International Collaborations

This thesis work will be carried out in the scope of the European project Cloud-TM, whose aim is to develop a self-optimizing middleware platform aimed at simplifying the development and administration of applications deployed on cloud computing infrastructures.

The Cloud-TM consortium is composed by international representatives of Academia (IST and CINI) and Industry (Red Hat, Algorithmica), thus giving the possibility to the student to come in contact with international experts and work on challenging and cutting-edge topics which are of interest for a very broad community. TAS will be integrated with one of the mainstream open source transactional data grids, namely Infinispan by Red Hat, which is also a partner of Cloud-TM.

The thesis will provide plenty of occasions to closely collaborate with the team of Infinispan and to contribute code to some core components of the Cloud-TM platform.

Possibility of Scholarships

The Cloud-TM project will provide a scholarship to support this thesis work.