(Distributed File Systems in the Cloud - Part 2)
In our first post we discussed some background on distributed file systems (DFS) and parallel distributed file systems (PDFS). In this post, we’ll discuss our experience in setting up Lustre on Amazon EC2.
Our goal was to build a simple Lustre cluster that allowed us to use EC2 ephemeral storage. This storage is “free” in the sense that unlike EBS volumes, ephemeral storage is included in the base configuration of our EC2 instances at no additional cost. We were already running quite a bit of our infrastructure on EC2, but not using the ephemeral storage attached to our instances; instead, we were relying on Amazon’s EBS volumes for storage. Lustre can replace those.
If you’re not familiar with Lustre architecture and the various components of a Lustre cluster, please see the excellent documentation in the Lustre manual, which is available online here. In particular, refer to section 1.1 “Understanding Lustre Architecture”.
We used the current stable feature release of Lustre, 2.7, which was released on March 17, 2015. We deployed the Lustre server components on Centos 6.5, with a patched 2.6.32 kernel (2.6.32-504.8.1.el6_lustre.x86_64).
Our very simple Lustre cluster consisted of co-located Management Server (MGS) and Metadata Server (MDS), with associated storage for the Management Target (MGT) and Metadata Target (MDT). We started with a single Object Storage Server (OSS) and its disk storage for the Object Storage Target (OST). This simple design is shown below.
We could easily add storage capacity simply by including more OSTs or OSS/OST combinations. These can even be added after the initial cluster is up and running.
The “Workstation” was a Centos 6.6 desktop running in our Danvers office with the following configuration:
1 dual-core CPU @ 3.40GHz
1 GB RAM
20 GB local storage
Of course, the workstation’s local storage is not what we’re interested in!
A more robust implementation would be to place the MGS on a separate server from the MDS, with separate storage for each server, but the layout above is sufficient for our current needs. For further discussion and ideas, see the “Lustre Cluster” section of the manual (in particular “Figure 1.2. Lustre cluster at scale”).
EC2 provides many different types of hosts, with different as for CPU, RAM, and networking. A key component of Lustre is LNET, the networking layer that “glues” everything together in a Lustre deployment. We chose EC2 instance types with “High” network performance characteristics.
The table below shows the EC2 instances used in this evaluation.
The storage available for the MGS/MDS and clients, of course, is of less importance than for the OSS/OST, so we simply took the default available with the EC2 instance types in question. Our OST gives us a total of almost 160GB (2 x 80GB) for clients to use when they connect to our cluster.
When configuring your EC2 security group and starting your EC2 instances, be sure to open port 988 to external clients, or you will not be able to access the OSS/OST from a local workstation.
It’s possible to build Lustre from source, but Intel provides a series of RPM files for the RHEL family and other supported Linux server distributions. This makes installation much more straightforward.
Rather than try to rival the excellent instructions at the HPDD Wiki we refer you to the “Walk-thru: Deploying Lustre pre-build RPMs” on that site. Some things that we did differently are:
We downloaded the .rpm files directly from Intel’s site, rather than modifying our yum repository configuration.
We manually created the filesystem, rather than using llmount.sh.
In summary, the steps we used are as follows:
Download and install the patched Linux kernel
Download and install the Lustre software, along with supporting software (e2fsprogs, etc.)
At this point, we generated two EC2 Amazon Machine Images (AMIs) from our running systems, one for the server hosts and one for the client machines (should we want to use EC2 clients). This made it quick and easy to spin up new instances of Lustre components.
Next, we built our cluster in the following way:
Set up the MGS, MDS, and MDT
Create the file system
Create a mount point
Mount the MGS/MDT file system
Set up the OSS and OST
Create the OST file system
Create a mount point
Mount the OST file system
All details for these steps can be found in the Lustre manual.
Note that In order to use all the available storage on any node that you intend to be an OST, you may have to create a logical volume, using a logical volume manager (LVM). In this way, the individual disks present as a single volume, on which you can then create the Lustre file system.
Once the server components were in place, connecting a client machine that had Lustre installed, to the cluster needed only a few steps:
Start Lustre services
Then connecting to the cluster was a single command:
Where MGS_IP is the IP address of your MGS; lustrewt is the name of the file system on the MGS, and /mnt is the client mount point.
Now, when you run df on the client, you’ll see the remote clustered file system:
Filesystem Type Size Used Avail Use% Mounted on
ext4 18G 4.0G 13G 25% /
tmpfs tmpfs 495M 80K 495M 1% /dev/shm
/dev/sda1 ext4 477M 84M 368M 19% /boot
lustre 145G 61M 137G 1% /mnt
Some resources that were particularly useful in the preparation of our Lustre cluster, as well as with this post are listed below.
Now that we have a working Lustre cluster, we want to compare it with FhGFS/BeeGFS, mentioned in our first post.
The next posts in this series will cover:
FhGFS (BeeGFS) on EC2
Performance Findings and Comparisons of the File Systems