Archive for the ‘rackspace’ Category

The relational database is up against its limits on big sites

Thursday, September 24th, 2009

10 years ago, when I was getting serious about programming, I read Phillip Greenspun’s book about websites and databases. Greenspun does a great job of explaining some of the thinking, and even traditions, that shaped relational databases. I read that book and formed certain opinions about databases, for instance, that a database should not store the same info in multiple places (that is, the database should be normalized). You would be a fool to store a user’s phone number in 3 different places, because when they change their phone number, there is a risk that one database table with update but another won’t, and then you have inconsistent data about the user.

I’ve been fascinated with the articles, more and more over the last 3 years, arguing for a new style of storing data:

Distributed vs. Relational Databases

Traditional relational databases are 30 years old, are well understood and have a huge ecosystem of tools around them. For that reason, it’s a compelling option when building your application. Postgres, MySQL, and Oracle are all relational databases modeling a schema on entities and relations between those entities. That’s a good, powerful programming model with interesting theoretical properties. But companies with large amounts of data have already gone past what you can reasonably fit on a single machine, even on high-end hardware, and it’s provably impossible to keep the traditional relational model, in particular the ACID properties, while scaling across multiple machines. Even if you’re willing to give up availability, scaling reads (via caching and replication) is difficult with relational databases, and scaling writes by partitioning is either very expensive, very painful from an application programming and operations standpoint, or both.

Cassandra is taking the approach that, given that you’re going to have to give up some parts of the relational model to scale, let’s start over and rethink things. Let’s add things like transparent replication and failover, built-in partitioning and load balancing, multiple data center support, and the ability to add capacity without ever disturbing applications running against the database.

Amazon’s EC2 vs Mosso

Monday, July 13th, 2009

We are about to launch a new project on Amazon’s EC2, so I’m looking for information about it. Jay Kuri has posted an in-depth review of the two, much worth reading:

One place where Amazon’s offering shines is in additional storage. Amazon offers EBS (Elastic Block Service) which lets you create large blocks of filesystem-level storage which can be mounted onto any instance. When configured this way, you pay per Gigabyte allocated. This is very important to those sites which require a significant amount of storage. Mosso does not offer mountable additional storage.

…Both Amazon’s EC2 and Mosso’s cloud-servers require you to pay for the bandwidth you use. In both cases, their costs are reasonable. Amazon is a bit cheaper here at $0.17 per gigabyte, compared to Mosso’s $0.22. Amazon’s prices drop as your bandwidth usage increases, but with the first drop at 10 Terabytes, not a lot of people are likely to notice that difference.

Interestingly, for inbound traffic, Mosso beats out Amazon with a rate of $0.08 per gigabyte to Amazon’s $0.10. This can make a difference if your service takes a lot of uploads or you do a lot of content aggregation. Generally speaking, however, unless your application has a somewhat inverted traffic pattern to most sites, this won’t make a huge difference for you.

…There is one difference between cloud servers and EC2 that can be extremely important. That is that Amazon enforces a strict 5-public-ip rule. This means that your company can only have 5 dedicated public IPs, regardless of how many servers you have. Note that I did not say application, this is a per-EC2-account rule. This can be a problem if your company hosts multiple sites, or if you have other public-IP requirements (such as SSL certificates for multiple sites for example.) Mosso does not have such a restriction. At Mosso, you have one public IP for every instance you have.

Mosso also provides reverse-DNS to your own domain, which can be very important for things such as email delivery. EC2 does not provide any method to edit reverse DNS. Among other things, this makes EC2 not feasible for anything requiring outbound email delivery. You can get around this by having a non-EC2 relay host for outbound email, but in many ways requiring an external host defeats the purpose of working within the cloud in the first place.

…On the whole Amazon EC2 and Mosso are very close competitors. Amazon loses out on CPU power per dollar spent, but Mosso costs a bit more per gigabyte of RAM. Mosso loses out on ‘extra storage’ but wins on general IO speed. Mosso has the advantage on individual server upgrades with it’s instant-upgrade features, but Amazon wins on large-scale deployment because of it’s API and solid auto-scaling options.

So which do you choose? If your application is particularly memory bound, requires a huge amount of disk space, or if you are at the higher end in terms of clustering requirements, Amazon’s EC2 is a good solution.

If that is not the case, however, EC2 just doesn’t hold up, cent for cent, against Mosso’s offerings. In my opinion, the combination of lower cost, better base CPU/RAM options and a smoother upgrade path make Mosso’s cloud-servers the clear winner.

Using cloud services instead of dedicated servers

Friday, November 21st, 2008

In late 2005 I was working at Bluewall and the owner of the company became convinced that some of the sites we were building were about to go super nova, so he got a second server from Rackspace. This was a stupid decision, but it is a a common one. I find clients often get over-excited about their projects. I’m sure some of this excitement is a healthy part of being emotionally invested in the project that you’re trying to bring to life, but that same excitement can lead to needless expense. Dedicated servers from RackSpace are expensive and, as it turned out, Bluewall’s need for extra servers was years in the future. The purchase decision in 2005 represents a lot of wasted money. So why didn’t the second server get shut down? Because once you have even one (low traffic) site on a server, it becomes a pain to shut down that server.

The Second Road had a similar experience. They started off with 3 dedicated servers, plus a firewall and load balancer, from Rackspace. The cost was $1,800 a month. When we got involved with the project, we moved the site to a server from Hostway, which costs us $150 a month (and on which we have several other websites, all sharing one server). So far, this has met all of the Second Road’s needs on the web.

Dedicated servers represent a lot of potentially wasted resources, especially for a small startup which may grown quickly, but which also may NOT grow quickly.  And what is growth, when you’re talking about traffic? If you get mentioned on BoingBoing, and suddenly you are getting 200 requests a minute, and your server collapses under the load, is it time to get another server? What if the spike in traffic lasts 2 days and then dies away and never comes back? If you get another server based on that one spike, you’ll regret it. Unless the spike is permanent. In which case you’ll regret not getting that extra server sooner.

These experiences have me interested in the new cloud services, which promise just the right amount of computing power that your site needs, no more no less. The idea of the “cloud” is that you can scale your resource needs up or down in increments finer than a whole server. So for the next big client we get we will try a cloud service. These are 3 we are looking at right now:

Mosso (this was just bought by Rackspace)

Amazon

IronServers

There has been a lot of investment, by a lot of companies, in cloud services and, really, it is hard to keep up. The fact that as good a company as RackSpace was willing to buy Mosso speaks well about Mosso. Since there is no way to evaluate all the contenders now competing in this field, we are forced to look at the best known, and then one of the somewhat unknowns, just as a test.

For the Amazon services, one of the best known users is SmugMug. They’ve mostly written about the S3 service, but also some about the cloud service.

Amazon is also now moving into the Content Distribution Network business. This is an aspect of building websites that we haven’t yet researched much. My feeling is that neither us nor our clients will ever be interested in the nitty gritty details of getting content to every corner of the world. We don’t want to be up late at night wondering “Is the site running fast enough in Malaysia?” So we’re pleased to see Amazon in this space and we hope they do a good job, because certainly we will give them a close look, if we ever work with a client who really needs that kind of scale.

This is from the article on Ajaxian:

With nearly 2.5 million requests per day to the jQuery website, the jQuery project team is constantly on the look out for ways to decrease hosting costs while still managing the growing number of requests for the site’s resources. Originally leveraging Amazon S3 for many of their static pages, the project has now turned to Amazon’s new CloudFront CDN. The change has allowed for jQuery pages to be globally hosted as opposed to being centrally located in Amazon’s Seattle-based S3 hosting center.

In tests, John Resig, team lead for the jQuery project, noticed substantial performance gains based on the switch:

I ran a similar test here in Boston and even managed to see a large improvement. I was seeing latency of anywhere from 50-200ms on Amazon S3, but only a latency of 17-19ms with CloudFront.

What does all of this mean? It means that the jQuery site is going to load even faster than it does now. We already receive some excellent hosting from Media Temple but being able to off-load these static files to the fast-loading servers will only make for a better browsing experience.

In less than 24 hours the project had received almost 2.5 million requests for over 50GB of data. The only drawback is an increase in bandwidth costs but still substantially less than that of a traditional hosting plan. The jQuery project makes use of the Google AJAX API as well and recommends it as choice for linking to the jQuery and jQuery UI libraries.