Friday, April 26, 2013

Benchmarking remote data access

A while back I mentioned SkeletonKey and how it uses Parrot and Chirp to allow users to remotely access software and data.  This blog post will go through some of the benchmarking that I did to see what sort of performance trade-offs are needed in order to access data remotely.

In order to do this, I conducted some benchmarks to see how performance differs between using local file access and remotely accessing the same filesystem using Chirp.  The tests all wrote to a system with a 3 drive RAID0 array that provided raw speeds of about 300MB/s.  Then 64 clients were set to read or write 10GB files on the array either locally or remotely using Chirp from a nearby cluster that had a 10Gbit network connection to the server.  This was done several times and the time required to read or write the files were recorded and a histogram of the times created.

The two plots below show the time it takes to read a 10GB file from the RAID array or remotely using Chirp.  The clients accessing the files locally had some interesting behaviour with the mean time for completion being about 3025s but with a group of clients completing just under 3000s and a similar number of clients completing much faster at in about 2700s.  The clients reading the files remotely had a mean time for completion of about 3050s and completion times were mostly clustered around this time.


The plots for write performance when writing 10GB to separate files are shown below. Here the clients writing locally cluster around completion times of ~2575s.  The completion times for clients writing remotely have a mean  value of about ~2900s  although there's a much wider spread in completion times in this case.
Looking at the plots, it turns out the the overhead for remotely accessing data through Chirp is fairly low.  It's about ~10% for reading and about ~18%  for writing.  All in all, it's a fairly reasonable tradeoff for being able to access your data from other systems!

No comments:

Post a Comment