At a certain point in the development cycle, the best way to improve and move a project forward is to get user feedback in order to catch bugs in situations that you didn't anticipate and to feedback from people using software in a variety of situations. For SkeletonKey, this is taking place through engagement with the Double Chooz and South Pole Telescope (SPT) groups here in order to allow them to run computations on UC3 and eventually OSG.
After discussions with users from both Double Chooz and SPT, it's quickly become apparent that the biggest issues preventing their use of UC3 and other resources are those of software and data access. In regards to software access, both groups have moderately large software stacks (~2-5GB) that need to be available in order for them to run computation. Although 5GB of software is something that can be easily transferred using scp or something similar for one-off computations, this quickly becomes unmanageable when scaling up to hundreds or thousands of jobs using this software. Even with 10Gbit network connectivity, transferring a terabyte of software is time consuming and terribly inefficient use of bandwidth! CVMFS comes to the rescue here by allowing software to only transfer the portions of the software that they access and by utilizing a squid proxy to minimize the data transfer and push the bandwidth utilization to local networks rather than to backbones. We're currently in the process of installing their software on a CVMFS repository and updating their workflows to use SkeletonKey to remotely
There are also some overarching themes in the data access problems both Double Chooz and SPT have. They both need a way to stage data in and out to the systems that are doing the computations. There are some slight differences between the requirements of each group (Double Chooz's workflow would be able to work with the non-POSIX semantics of the UC3 HDFS filesystem, SPT's workflow can't), but the primary issues are the same. We're planning on using SkeletonKey to solve this as well. By using Chirp and Parrot under the hood, both projects will be able to use data that's residing on their own systems in jobs running on other clusters removing the need to stage data in and out for jobs.
No comments:
Post a Comment