Tuesday 18 January 2011

Clouds and resilience ...

The floods in Brisbane were a bloody awful mess. And one of the consequences was neatly summed up in this anonymised tweet:

All UQ Library servers on both DR sites being shutdown due to flood and power cuts.. Hopefully this will push the case for UQ going cloud.

Now, I don't know, but I'm guessing that UQ had a classic dual data centre design with a machine room in two geographically separate locations, both of which got flooded. They are not the only institution to operate such a design and universities are prone, in these cash strapped times, to use in house facilities rather than moving their backup facility to some properly geographically separate location several kilometres away rather than just the other side of campus.

Of course there are no guarantees in this world and you could simply have bad luck and choose a backup provider that also went out of action due to rain, hail, flood or a plague of demons.

Cloud initially seems attractive as an alternative due to its distributed nature but you need to be sure about what you are buying - for example is your data replicated to multiple locations for resilience or is it just out there in on a server in Ktoznayetistan? If the latter you, havn't gained a lot more than having it replicated to a server a few hundred kilometres away.

For example, when I was on the steering group for the UK mirror service in the 1990's data was held at Kent and Lancaster universities and replicated between the two, with a bit of load balancing logic to stop sites being overloaded. The net result was near perfect uptime for the service if not for the individual sites. Adding a third site would have made things even better.

And the key was real geographic separation such that major local events could knock out one site but the other would keep going. You don't need cloud to do this - although using cloud is a valid approach - what you want is decent replication and geographical separation - which isn't cheap as you have both network traffic costs and storage costs to consider.

And of course, if you move all of your data to the cloud you want to be damn sure that it's in to non adjacent locations at all times, it rains in Ktoznayetistan as well.

But data is well, just data. With out the execution devices, ie the web servers and data base servers that act on the data to provide content management, websites, institutional repositories, mail services and the like all you have is a pile of expensively replicated ones and zeroes.

Now most sites have multiple servers with a bit load balancing doing critical jobs, and these days most are virtualised meaning that they can be run on someone else's infrastructure just as well as yours - just add some load balancing logic.

There are companies that will provide hosting services, otherwise known as Infrastructure as a Service (IaaS), but again it costs money, but as the old UK mirror service experience shows it does deliver performance and resilience, but remember what you are paying for is simple resilience - meaning you need to clearly understand which servers you need to replicate and why and the consequences of not replicating those you choose not to replicate. And, because they are in a dynamic load balancing configuration you need to understand that there will be network traffic charges and general running costs.

So Cloud can be part of the answer, but is not the answer. Peering possibly is a better answer where groups of universities, who all run broadly similar services on broadly similar hardware get together and provide both data and execution hosting for each other ensuring reasonable geographic separation, say so that you both had data and execution in three places, say Sydney, Canberra and Adelaide, or Kent, York and Glasgow. But remember, you can have cheap, or resilient, but not both ...

No comments: