Data delivery with no single point of failure

We realized a platform to deliver a large amount of scientific data in a network, without single point of failure.

We were asked to realize a system that hosts an application which outputs large amounts of scientifica data on a wide area network. This application allows specific users to select and download some data from a terabytes-sized source.

Infrastructure requirements include affidability, 24x7 availability, adaptability and exapndibility.

We realized the platform using the following components:

Hardware: a pair of 16-core Intel Linux servers and 30 TB disk space each, connected with a 10G/s ethernet, which serves as service network, and a
standard gigabit ethernet.

Software: we coupled a "private cloud" solution (Ganeti) to a "software-defined storage" (Gluster) system. Both components are open-source software, freely available.

Ganeti is an open-source Google project, defined as a cluster-based virtualization management software. It is a tool built on top of existing virtualization technologies such as Xen or KVM, and a DRBD storage system (RAID1 on ethernet). Ganeti is designed to ease cluster management of virtual servers and to provide fast and simple recovery after physical failures using commodity hardware.

GlusterFS is an open source distributed file system, that clusters together server resources (namely storage building blocks) over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace.

The interesting feature of both software systems is that they don't have a single point of failure, because they are both based on a peer-to-peer architecture, where each peer can act as master. Moreover, virtual machines (Ganeti-DRBD) and data (Gluster) are double-replicated.

The system is also adaptable and expandable: it is possible to increment the performances (memory availability and computational resources) by adding new servers to the service network. It is also possible to replace an obsolete component with newer systems, without a performance degrade or system downs: this makes a great difference with respect to traditional systems such as SAN-based clusters.

Ultimately, the utilization of virtual machines facilitates the software components evolution, because the development, testing and production environments can live in the same infrastructure.