May 08, 2015

Lustre on EC2

(Distributed File Systems in the Cloud - Part 2)

In our first post we discussed some background on distributed file systems (DFS) and parallel distributed file systems (PDFS).  In this post, we’ll discuss our experience in setting up Lustre on Amazon EC2.

What We Built

Our goal was to build a simple Lustre cluster that allowed us to use EC2 ephemeral storage. This storage is “free” in the sense that unlike EBS volumes, ephemeral storage is included in the base configuration of our EC2 instances at no additional cost. We were already running quite a bit of our infrastructure on EC2, but not using the ephemeral storage attached to our instances; instead, we were relying on Amazon’s EBS volumes for storage. Lustre can replace those.

If you’re not familiar with Lustre architecture and the various components of a Lustre cluster, please see the excellent documentation in the Lustre manual, which is available online here. In particular, refer to section 1.1 “Understanding Lustre Architecture”.

We used the current stable feature release of Lustre, 2.7, which was released on March 17, 2015.  We deployed the Lustre server components on Centos 6.5, with a patched 2.6.32 kernel (2.6.32-504.8.1.el6_lustre.x86_64).

Our very simple Lustre cluster consisted of co-located Management Server (MGS) and Metadata Server (MDS), with associated storage for the Management Target (MGT) and Metadata Target (MDT). We started with a single Object Storage Server (OSS) and its disk storage for the Object Storage Target (OST). This simple design is shown below.

image

We could easily add storage capacity simply by including more OSTs or OSS/OST combinations. These can even be added after the initial cluster is up and running.

The “Workstation” was a Centos 6.6 desktop running in our Danvers office with the following configuration:

  • 1 dual-core CPU @ 3.40GHz

  • 1 GB RAM

  • 20 GB local storage

Of course, the workstation’s local storage is not what we’re interested in!

A more robust implementation would be to place the MGS on a separate server from the MDS, with separate storage for each server, but the layout above is sufficient for our current needs. For further discussion and ideas, see the “Lustre Cluster” section of the manual (in particular “Figure 1.2.  Lustre cluster at scale”).

EC2 Infrastructure

EC2 provides many different types of hosts, with different as for CPU, RAM, and networking. A key component of Lustre is LNET, the networking layer that “glues” everything together in a Lustre deployment.  We chose EC2 instance types with “High” network performance characteristics.

The table below shows the EC2 instances used in this evaluation.

image

The storage available for the MGS/MDS and clients, of course, is of less importance than for the OSS/OST, so we simply took the default available with the EC2 instance types in question.  Our OST gives us a total of almost 160GB (2 x 80GB) for clients to use when they connect to our cluster.

Networking

When configuring your EC2 security group and starting your EC2 instances, be sure to open port 988 to external clients, or you will not be able to access the OSS/OST from a local workstation.

Installation

It’s possible to build Lustre from source, but Intel provides a series of RPM files for the RHEL family and other supported Linux server distributions.  This makes installation much more straightforward.

Rather than try to rival the excellent instructions at the HPDD Wiki we refer you to the  “Walk-thru: Deploying Lustre pre-build RPMs” on that site.  Some things that we did differently are:

  • We downloaded the .rpm files directly from Intel’s site, rather than modifying our yum repository configuration.

  • We manually created the filesystem, rather than using llmount.sh.

In summary, the steps we used are as follows:

Download and install the patched Linux kernel

Download and install the Lustre software, along with supporting software (e2fsprogs, etc.)

Disable SELinux

  • Per Intel “Lustre does not play well with SELinux…”

At this point, we generated two EC2 Amazon Machine Images (AMIs) from our running systems, one for the server hosts and one for the client machines (should we want to use EC2 clients). This made it quick and easy to spin up new instances of Lustre components.

Next, we built our cluster in the following way:

Enable LNET

  • You may first need to ensure that the interface is set to eth0

Set up the MGS, MDS, and MDT

  • Create the file system

  • Create a mount point

  • Mount the MGS/MDT file system

Set up the OSS and OST

  • Create the OST file system

  • Create a mount point

  • Mount the OST file system

All details for these steps can be found in the Lustre manual.

Note that In order to use all the available storage on any node that you intend to be an OST, you may have to create a logical volume, using a logical volume manager (LVM). In this way, the individual disks present as a single volume, on which you can then create the Lustre file system.

Connecting a Client

Once the server components were in place, connecting a client machine that had Lustre installed, to the cluster needed only a few steps:

  • Start Lustre services

  • Start LNET

Then connecting to the cluster was a single command:

# mount -t lustre MGS_IP@tcp:/lustrewt /mnt

Where MGS_IP is the IP address of your MGS; lustrewt is the name of the file system on the MGS, and /mnt is the client mount point.

Now, when you run df on the client, you’ll see the remote clustered file system:

Filesystem           Type    Size  Used Avail Use% Mounted on

/dev/mapper/vg_lustre-lv_root

                    ext4     18G  4.0G   13G  25% /

tmpfs                tmpfs   495M   80K  495M   1% /dev/shm

/dev/sda1            ext4    477M   84M  368M  19% /boot

10.40.200.142@tcp:/lustrewt

                    lustre  145G   61M  137G   1% /mnt

Helpful Resources

Some resources that were particularly useful in the preparation of our Lustre cluster, as well as with this post are listed below.

Next Installments

Now that we have a working Lustre cluster, we want to compare it with FhGFS/BeeGFS, mentioned in our first post.

The next posts in this series will cover:

  • FhGFS (BeeGFS) on EC2

  • Performance Findings and Comparisons of the File Systems

Glenn Street

Data Architect