Friday, April 26, 2013

Benchmarking remote data access

A while back I mentioned SkeletonKey and how it uses Parrot and Chirp to allow users to remotely access software and data.  This blog post will go through some of the benchmarking that I did to see what sort of performance trade-offs are needed in order to access data remotely.

In order to do this, I conducted some benchmarks to see how performance differs between using local file access and remotely accessing the same filesystem using Chirp.  The tests all wrote to a system with a 3 drive RAID0 array that provided raw speeds of about 300MB/s.  Then 64 clients were set to read or write 10GB files on the array either locally or remotely using Chirp from a nearby cluster that had a 10Gbit network connection to the server.  This was done several times and the time required to read or write the files were recorded and a histogram of the times created.

The two plots below show the time it takes to read a 10GB file from the RAID array or remotely using Chirp.  The clients accessing the files locally had some interesting behaviour with the mean time for completion being about 3025s but with a group of clients completing just under 3000s and a similar number of clients completing much faster at in about 2700s.  The clients reading the files remotely had a mean time for completion of about 3050s and completion times were mostly clustered around this time.


The plots for write performance when writing 10GB to separate files are shown below. Here the clients writing locally cluster around completion times of ~2575s.  The completion times for clients writing remotely have a mean  value of about ~2900s  although there's a much wider spread in completion times in this case.
Looking at the plots, it turns out the the overhead for remotely accessing data through Chirp is fairly low.  It's about ~10% for reading and about ~18%  for writing.  All in all, it's a fairly reasonable tradeoff for being able to access your data from other systems!

Monday, April 1, 2013

Installers and more

Before talking about performance, I'd like to talk a bit about installers.  From a development point of view, installers are for the most part not that exciting.  The bulk of an installer's code involves copying files around and possibly changing system settings.  This is all fairly standard and relatively well understood, but has a lot of corner cases that can cause bad things to happen™ (data loss, opening up security holes, etc).  Fortunately, there's standard installers for this like Wise on windows and rpm, debs, etc. on Linux.  However, SkeletonKey is targeted toward users and therefore can't use these, so I needed to roll something for my self.

Turns out that python's batteries included philosophy makes for a nice adhoc installer. SkeletonKey's installer needed a few things, to be able to figure out and download the latest version of the CCTools tarball; then to install it in an user specified location; and finally to download and install the SkeletonKey scripts.  Using the urllib2 module, the installer can download pages and tarballs from the web using
urllib2.urlopen(url).read()
to get the html source for a page for scraping.  Downloading a tarball to use is a bit trickier, but not much so:
(fhandle, download_file) = tempfile.mkstemp(dir=path)
url_handle = urllib2.urlopen(url)
url_data = url_handle.read(2048)
while url_data:
  os.write(fhandle, url_data)
  url_data = url_handle.read(2048)
os.close(fhandle)
The tarfile module also comes in handy as well when it's time to untar packages and do some examination of the contents of downloaded tarballs.  The first thing is to do some inspection of the tarball.  Luckily, all the installer only deals with tarballs that put everything into a single parent directory so the following code gives us the directory the files will be extracted to:
downloaded_tarfile = tarfile.open(download_file)
extract_path = os.path.join(path, downloaded_tarfile.getmembers()[0].name)
Once we have this, we can use the extractall method to untar everything. The tarfile module also provides some convenient tools for repackaging things into a new tarball that users can then subsequently use.  There's some other things that need to be checked (in particular, the extractall method needs sanity checking on the tarfile that you're checking otherwise someone can put an absolute path or something like ../../../../../../etc/passwd in there to do bad things), but the python modules provide a lot of help with doing some of the more complicated tasks of an installer like downloading files from the web and opening up installation packages.