Is Your Head in the Cloud? (Part 1)

Alphabet Soup: IaaS, PaaS, SaaS, OpEx, CapEx, oh my!

Over the last few years, cloud computing has become an integral element of the IT landscape. It's impossible to discuss new IT deployments without considering how cloud services may play a role. In the Life Sciences, especially with next generation sequencing (NGS), the cloud is regularly part of the conversation, from cloud hosted Software-as-a-Service (SaaS) providers, to scale-out systems for augmenting existing resources, and as a solution for the long-term storage of sequencing data, to name a few recurring themes.

Seen by some as a panacea and by others as a "non-starter", the mere mention of the cloud can evoke strong emotions that often cloud (sorry) reasoned discourse. Since my early days working on RNA informatics for SOLiD, I've had the unique opportunity to constantly evaluate different approaches to genomics compute infrastructure, and the cloud has always been included.  In this series of posts, I will explore some of common cloud topics and discuss how cloud solutions can play a role in NGS and the Life Sciences.

Just What is the "Cloud"?

Before going much further, it probably helps to agree on what the "cloud" is. For the sake of this discussion, the cloud is any computational or storage service purchased on a pay-as-you-go model and not physically managed by your internal IT team. This definition includes "traditional" cloud vendors such as Amazon, Rackspace, and IBM's SoftLayer, but also allows for co-located "private clouds" that live on your site but are maintained by a third party.

Terms such as "Infrastructure-as-a-Service" (IaaS), "Platform-as-a-Service" (PaaS), and, of course, "Software-as-a-Service" (SaaS) help define different ways people use the cloud and help differentiate vendor services. Amazon's cloud, for instance, is best characterized as PaaS. It is essentially a platform for defining and building virtual computational and storage services based on an abstract set of components (e.g., Micro Instances, DynamoDB, S3 Buckets).

Each category of cloud services supports different types of usage scenarios. IaaS is ideal for provisioning raw hardware, PaaS helps developers build applications, and SaaS provides end users with complete solutions. Each category of cloud services supports different types of usage scenarios. IaaS is ideal for provisioning raw hardware, PaaS helps developers build applications, and SaaS provides end users with complete solutions.

IBM's SoftLayer is IaaS. When setting up nodes and storage on SoftLayer, there's a 1:1 correspondence between the hardware you select and the hardware you get. With IaaS, you define your infrastructure at the hardware level and treat it in the same way you would locally procured hardware. IBM's Bluemix is a PaaS model built on top of SoftLayer. Bluemix provides a set of services that you can use to build applications (e.g., databases, web servers, Watson, etc.). Instead of provisioning hardware, Bluemix provisions services.

The cloud services most people are familiar with are actually SaaS applications built on IaaS or PaaS. is the classic example of a modern cloud-based SaaS product. is implemented entirely on cloud services. It is a monolithic "multi-tenant" application where all users share the same software instance (there is just one Salesforce application) and different users' data is segmented by software policies. In genomics, DNANexus and Seven Bridges Genomics both offer SaaS products.

Looking at the cloud in terms of different service offerings helps one understand the different ways it can be leveraged for NGS applications. But, before we look at the technical side of things, it's worth taking a short side trip into the world of accounting (I promise this won't be as boring as it sounds!).

The Cloud as an Operating Expense

One of the most important aspects of cloud solutions is that, from an accounting perspective, they can be treated as operating expenses (OpEx) instead of capital expenses (CapEx). Compute resources – servers, workstations, clusters, storage – when deployed on-site are almost always capital expenses. They are physical units with upfront costs that depreciate over time. Initial costs are often high and thus a point of contention between different interests competing for the same budget dollars. Once deployed, they also have associated operating expenses such as admin salaries, power, and spare parts.

Cloud services, on the other hand, are always pay-as-you-go operational expenses. With cloud services, you pay for what you need when you need it. The initial impact on the books is much smaller than the CapEx associated with hardware purchases. However, over time those costs can become much larger than the CapEx and OpEx associated with owning the resources.

When considering prices for cloud vs. local resources, it's very important to develop a clear financial model based on realistic usage scenarios. This should include everything from the cost of power all the way up to the software you may have to purchase. Since cloud services are often billed based on usage and specific capacity metrics (e.g., number of registered users), accurate and honest estimates for both current and planned growth are essential for predicting your costs.

The Cloud at Lab7

In each of these cloud posts, I'll include a short discussion of how the topics apply at Lab7 Systems. For today's post, let's look at how we use the cloud across our business.

At Lab7, we primarily use IaaS and SaaS cloud services. Most of our core business functions run on SaaS products. We use Google Business for email, calendars, and documents (Apps and Drive). We use Zoho for our CRM. We use Atlassian's Confluence and Jira for our internal wiki and issue tracking and BitBucket for our source code repository.

These are all services that we could host on our own, but by using SaaS products we avoid the additional admin tasks and overhead of maintaining servers and software. We can also access these services from anywhere and never have downtime issues. Of course, none of these services is terribly expensive (about $150/month total for all of them), so it was an easy decision to make.

For our development and demo servers, we use a mix of IaaS servers from IBM's SoftLayer and internal servers. For quickly bringing up new servers to demo the Lab7 ESP or test new features, SoftLayer provides a mix of features that makes it easy to match real-world scenarios. With SoftLayer, we can provision exactly the hardware we need – be it a 4-core node with 8 GB of RAM and three 2TB SATA disks, or an 8 node cluster with 128 GB/node connected to 50 TB of RAID5 storage. This flexibility makes SoftLayer's IaaS model perfect for quickly testing different scenarios and "right-sizing" our long-running demo servers. Of course, these deployment options are also available for our customers (but let's not get ahead of the discussion...).

In-house, we have a dedicated POWER8 server with 512 GB of RAM, 160 cores, and 15 TB of storage and a 4-node Dell VRTX cluster. We use these primarily for testing genomics workflows. Provisioning true high-performance servers and clusters is not possible with most cloud providers, and for those that do support it, managing the cost can be challenging. Given our regular usage, owning our own servers and clusters makes sense for us.

As you can see, at Lab7 we use a mix of cloud and local resources. For each service, we've weighed the pros and cons of different approaches and selected solutions that match our business needs and technical expertise.

In the next post, we'll dive into the technical aspects of cloud computing in the context of scientific and NGS workloads. We'll look at compute, storage, and networking issues and how they impact IaaS and SaaS solutions and how to reason about them when considering cloud solutions for genomics. In the last post, we'll look at some of the cultural and regulatory issues around using cloud services for sequencing and conclude with some overall thoughts on how the cloud will evolve for genomics.

This entry was posted in HPC