Living the Grid Life

Thursday, May 23, 2013

UC3 Jamboree thoughts

The UC3 group here held a UC3 user's jamboree today. Part of the preparation involved preparing exercies for users to do. It's surprising how many issues pop up in code that you thought was solid when you start exercising everything. Part of this can be corrected using a test suite and/or continuous integration methods but that won't get everything and users will still find the corner cases that you miss. It's unfortunate if you want to shield users from potentially confusing errors but that's the way it is.

The other interesting part of the Jamboree was getting the opportunity to do a little bit of Octave programming. I've never had to use Matlab or Octave so it was a bit new however it's sufficiently C like that wasn't too hard to pick up. However, lack of time prevented me from using a monte carlo analysis of time series data and I had to go with some simple matrix operations (matrix multiplication, calculating eigenvectors and eigenvalues). However, the original monte carlo code is something I'll probably come back to possibly as a part of a larger example using Swift to handle the generation of samples and coordinating analysis of the samples....

Tuesday, May 7, 2013

User Engagement with SPT and Double Chooz

At a certain point in the development cycle, the best way to improve and move a project forward is to get user feedback in order to catch bugs in situations that you didn't anticipate and to feedback from people using software in a variety of situations. For SkeletonKey, this is taking place through engagement with the Double Chooz and South Pole Telescope (SPT) groups here in order to allow them to run computations on UC3 and eventually OSG.

After discussions with users from both Double Chooz and SPT, it's quickly become apparent that the biggest issues preventing their use of UC3 and other resources are those of software and data access. In regards to software access, both groups have moderately large software stacks (~2-5GB) that need to be available in order for them to run computation. Although 5GB of software is something that can be easily transferred using scp or something similar for one-off computations, this quickly becomes unmanageable when scaling up to hundreds or thousands of jobs using this software. Even with 10Gbit network connectivity, transferring a terabyte of software is time consuming and terribly inefficient use of bandwidth! CVMFS comes to the rescue here by allowing software to only transfer the portions of the software that they access and by utilizing a squid proxy to minimize the data transfer and push the bandwidth utilization to local networks rather than to backbones. We're currently in the process of installing their software on a CVMFS repository and updating their workflows to use SkeletonKey to remotely

There are also some overarching themes in the data access problems both Double Chooz and SPT have. They both need a way to stage data in and out to the systems that are doing the computations. There are some slight differences between the requirements of each group (Double Chooz's workflow would be able to work with the non-POSIX semantics of the UC3 HDFS filesystem, SPT's workflow can't), but the primary issues are the same. We're planning on using SkeletonKey to solve this as well. By using Chirp and Parrot under the hood, both projects will be able to use data that's residing on their own systems in jobs running on other clusters removing the need to stage data in and out for jobs.

Friday, April 26, 2013

Benchmarking remote data access

A while back I mentioned SkeletonKey and how it uses Parrot and Chirp to allow users to remotely access software and data. This blog post will go through some of the benchmarking that I did to see what sort of performance trade-offs are needed in order to access data remotely.

In order to do this, I conducted some benchmarks to see how performance differs between using local file access and remotely accessing the same filesystem using Chirp. The tests all wrote to a system with a 3 drive RAID0 array that provided raw speeds of about 300MB/s. Then 64 clients were set to read or write 10GB files on the array either locally or remotely using Chirp from a nearby cluster that had a 10Gbit network connection to the server. This was done several times and the time required to read or write the files were recorded and a histogram of the times created.

The two plots below show the time it takes to read a 10GB file from the RAID array or remotely using Chirp. The clients accessing the files locally had some interesting behaviour with the mean time for completion being about 3025s but with a group of clients completing just under 3000s and a similar number of clients completing much faster at in about 2700s. The clients reading the files remotely had a mean time for completion of about 3050s and completion times were mostly clustered around this time.

The plots for write performance when writing 10GB to separate files are shown below. Here the clients writing locally cluster around completion times of ~2575s. The completion times for clients writing remotely have a mean value of about ~2900s although there's a much wider spread in completion times in this case.

Looking at the plots, it turns out the the overhead for remotely accessing data through Chirp is fairly low. It's about ~10% for reading and about ~18% for writing. All in all, it's a fairly reasonable tradeoff for being able to access your data from other systems!

Monday, April 1, 2013

Installers and more

Before talking about performance, I'd like to talk a bit about installers. From a development point of view, installers are for the most part not that exciting. The bulk of an installer's code involves copying files around and possibly changing system settings. This is all fairly standard and relatively well understood, but has a lot of corner cases that can cause bad things to happen™ (data loss, opening up security holes, etc). Fortunately, there's standard installers for this like Wise on windows and rpm, debs, etc. on Linux. However, SkeletonKey is targeted toward users and therefore can't use these, so I needed to roll something for my self.

Turns out that python's batteries included philosophy makes for a nice adhoc installer. SkeletonKey's installer needed a few things, to be able to figure out and download the latest version of the CCTools tarball; then to install it in an user specified location; and finally to download and install the SkeletonKey scripts. Using the urllib2 module, the installer can download pages and tarballs from the web using

urllib2.urlopen(url).read()

to get the html source for a page for scraping. Downloading a tarball to use is a bit trickier, but not much so:

(fhandle, download_file) = tempfile.mkstemp(dir=path)
url_handle = urllib2.urlopen(url)
url_data = url_handle.read(2048)
while url_data:
  os.write(fhandle, url_data)
  url_data = url_handle.read(2048)
os.close(fhandle)

The tarfile module also comes in handy as well when it's time to untar packages and do some examination of the contents of downloaded tarballs. The first thing is to do some inspection of the tarball. Luckily, all the installer only deals with tarballs that put everything into a single parent directory so the following code gives us the directory the files will be extracted to:

downloaded_tarfile = tarfile.open(download_file)
extract_path = os.path.join(path, downloaded_tarfile.getmembers()[0].name)

Once we have this, we can use the extractall method to untar everything. The tarfile module also provides some convenient tools for repackaging things into a new tarball that users can then subsequently use. There's some other things that need to be checked (in particular, the extractall method needs sanity checking on the tarfile that you're checking otherwise someone can put an absolute path or something like ../../../../../../etc/passwd in there to do bad things), but the python modules provide a lot of help with doing some of the more complicated tasks of an installer like downloading files from the web and opening up installation packages.

Monday, March 18, 2013

Playing around with remote software and data access

Playing around with SkeletonKey, Parrot, and Chirp

One of the projects, I've been working on is called SkeletonKey. SkeletonKey is a tool that lets people create scripts that will run their applications in an environment that allows for remote software and data access. Just as an example, suppose you're interested in analyzing the temperature data from the last 100 years and generating some graphics based on that. You could download all data to your computer and then run a program to go through all of the text. After a waiting a while, you'd get your results back.

Or if you had access to a cluster, you could split the task up and submit it to a few hundred cores and get the results back much more quickly than running the application on your personal computer. The only problem is that you may have terabytes of data and your application may be a few gigabytes in size and you'd rather not have to transfer all of this over to the cluster and then convince the administrators that they should install your application so that you can use it.

That's where Parrot and Chirp come in. Chirp allows you to export your data from your system to remote computers over the internet. Parrot lets you run your application in an environment that intercepts local file access and transparently turns it into remote network access. This is all done in user space so you can even use Parrot to run a shell script that then runs an application that's actually located elsewhere. So if you can run a shell script and access the web, you can run an application and get read/write access to data from remote sources without having to install a bunch of libraries and binaries or transfer large amounts of data in to do your work.

SkeletonKey works with Parrot and Chirp to generate a shell script that will do all the legwork for you. I.e. you give SkeletonKey a simple ini file with your configuration file and it'll generate a shell script that'll download your application as well as the appropriate Parrot binaries and then run your application in a Parrot environment that has all of the software and data access you may want.

I have more information on how well this compares to accessing your files or data locally but that'll need to wait until the next entry.