Archive for the ‘Scalability’ Category

Hybrid Cluster — self-healing, auto-scaling & very forgiving

Wednesday, December 28th, 2011

You won’t have heard much from Hybrid Logic recently — now with an early stage tech company this can mean one of two things: either they’ve given up and gone home, or they’re mad busy innovating, building and shipping their product.

I’m pleased to report that in our case it’s the latter ;-)

Hybrid Cluster has had an extraordinary year of development and we’re on the cusp of releasing some very exciting new features for the world to get to grips with. What we’ve done is nothing short of revolutionary — we’re changing the fundamental assumptions about how your servers can co-operate together, how applications and databases can scale, and how companies do business continuity planning across data centres.

In the “old world”, a server is seen as a single entity; one which has its own specific configuration, and which hosts a set of applications and databases. If you’re staying up-to-date with the industry, you’ll have virtualized that server and put its storage in a centralized storage system (a SAN, for example) — now that’s all very well, but the virtual server is still conceptually a single server and can still suffer from these three problems:

  1. Hardware and networks fail
  2. Servers get over-loaded when there are spikes in demand
  3. Users make mistakes

At Hybrid Logic it’s our mission to solve all three of these problems for your existing LAMP applications, and our software — available for license today — solves them by employing a fundamental paradigm shift in industry thinking.

Individual servers and storage systems should not be the unit of concern for you, the developer or administrator. Applications, databases and mailboxes should be — the servers should look after themselves.

Now, if you look a little further down the road, this is the way the industry’s moving — in cloud, the move from IaaS to PaaS is exactly this — developers and sysadmins should not have to think about individual server instances ever again. Their servers should form a cognizant co-operative group on their own. This is exactly what our software does — it transforms a bunch of dumb, commodity machines, connected by slow and unreliable network connections, into a loosely-coupled distributed cluster where the failure of an individual server or even an entire data centre is automatically healed so that the cluster carries on working — keeping your applications, databases and mailboxes online even in the face of catastrophic failure of an entire region.

I’m Luke, the CTO here at Hybrid Logic, and in the next few blog posts I’m going to give you a bit of insight into how we do it ;-)

Happy New Year!

Cheers,
Luke

Hybrid Cluster multi-region branch

Thursday, August 18th, 2011

We’re very excited to announce the imminent release of our multi-region branch. In our labs we now have one cluster spanning East & West coast USA on cloud infrastructure and dedicated hardware in Europe, with densities of over 1,000 websites per node.

More to come ;-)

Stability, performance and rendering improvements

Friday, November 19th, 2010

It’s been a busy first week of the beta here at Hybrid Logic HQ, and we’re very pleased by the response we’ve had to the start of the beta — thank you! There’s been a buzz of activity on the forums and we love it when you give us feedback, so please carry on experimenting with our software and tell us what you think.

Along with the awesome feedback from yourselves, which we taking careful note of, we’ve also been doing some improvements of our own. Here’s a quick breakdown of the fixes and improvements which have now been deployed across all your clusters:

  • At the deployment level, we can now add new instances to an existing cluster. This means we can unintrusively upgrade a cluster to include new or better spec machines (irrespective of physical location) so that you can scale your hosting operation seamlessly.
  • The “God Pod” has had some significant responsiveness improvements. When we first launched, it wasn’t the most responsive user experience in the world. It’s much quicker and more accurate now, so give it a go!
  • We’ve made significant improvements to the stability of the core web hosting platform. We’ve solved several problems which were causing “Default site on X” error messages where your websites should have been. Another bug was causing databases to sometimes become inaccessible, and we’ve solved that too. Stability is looking a lot better.
  • We’ve improved the intelligence of the core load balancing algorithms, meaning that the decisions to move a site from one server to another (due to load) is now a fair bit smarter, and you should see fewer unnecessary load balancing events. As ever, there’s still room for improvement.
  • We’ve enabled swap on all your machines, so that if your 1.4GB memory does ever get fully used up, your instances will just become slow for a few minutes as they recover, rather than falling over or crashing completely.
  • When a site is about to be moved from one server to another, what happens internally is that requests for that site get “paused” by the distributed proxying layer which runs on top of the web and database servers. This pausing happens so that during the transfer of the site or database from one server to another, none of the requests return error messages — rather, the user just experiences a slow page load. The Load Balancing Diagram in the God Pod now shows a dotted line around a site when it is paused. This gives you a better insight into what’s happening within the cluster during the process of moving sites from one server to another to keep your servers healthy and balanced.
  • Performance has been improved massively. Previously, load balancing events caused sites to be blocked for up to 20 seconds. We’ve managed to get this down to 3-6 seconds in most cases, resulting in fewer requests building up. We’ve also made some code changes which have made everything feel a lot snappier. We will be continuing to optimise for performance over the coming weeks and months — this is only the start!
  • Numerous tweaks and improvements to functionality in the Control Panel have also been deployed (more details on this will be posted to our forum in due course).

We can’t wait to see how much better we can make it next week!

The God Pod is ready…

Friday, November 12th, 2010

Watch this space.

Lightning talk at CloudCamp London

Thursday, October 21st, 2010
For those who missed it, here’s the text of my talk at CloudCamp London yesterday. CloudCamp was great fun, thanks Chris!

Slide 1

Hi, I’m Luke from Hybrid Logic and I’m going to talk about filesystem snapshots and how they are useful in cloud computing.

Slide 2

A snapshot is an instantaneous point-in-time copy of your filesystem. The blocks that haven’t changed aren’t needlessly copied so you can store lots of snapshots with less disk space than you’d expect.

What are snapshots good for? Well, have you ever deleted important files by accident? Keeping snapshots lets you quickly “roll back time”.

Also, if you can copy your snapshots onto a different server, they can act as a great backup which you can recover very quickly from.

Cloud instances aren’t perfect, and data loss/instance failure in not un-heard-of in public clouds. Whole industries have grown up around dealing with the transient, ephemeral nature of cloud instances.

Being able to take a snapshot of your server and clone it brings a new level of manageability as well. If you’ve ever started up an EC2 instance, then you have – perhaps unwittingly – cloned a snapshot of a disk image.

Slide 3

The cloud storage model

Infrastructure is the underlying compute hardware, whether real or virtualised. With respect to storage, the infrastructure corresponds to the block device exposed by, say, EBS on EC2, or the physical hard disk in a non-cloud data centre.

The platform includes the Operating System and crucially the Fileystem which you choose to install on your cloud instances.

My claim is that it’s better to have the snapshotting done at the filesystem level, than to rely on the underlying infrastructure’s snapshotting capabilities, if they exist at all.

Slide 4

The primary benefit of doing this is the removal of vendor lock-in. By having snapshots at the platform level you can replicate data between servers in entirely different cloud infrastructures, for example, you can move data between EC2 to ElasticHosts and back again. Plus you can move snapshots in and out of the cloud entirely, allowing you to build hybrid clouds without expensive, complex virtualisation in your own data centre. In total, this reduces your dependence on any one provider, which reduces your risk of downtime.

Slide 5

Relying on infrastructure for your snapshots brings some other problems too. When you take a snapshot with something like EBS, because the infrastructure can’t communicate “up” to the platform, it has no way of telling the filesystem that the snapshot is about to happen. If the filesystem is mid-way through a write when the snapshot takes place, you’ll end up with a corrupt snapshot.

One solution is to use a “pausable” filesystem, such as XFS, so you can flush it to disk and block the flow of writes during a snapshot. But because you require interaction between the two different layers, the process of pausing the filesystem and taking the snapshot can take a long time, which has been known to crash MySQL.

ZFS allows the unification of these layers. By some Linux kernel hackers this has been described as a “rampant layering violation” but I prefer to think of it as a elegant refactoring, because in fusing these two layers together ZFS becomes faster and smarter, guaranteeing O(1), consistent filesystem snapshots.

Slide 6

Comparison: filesystems with snapshots

XFS on EBS gives you vendor lock-in and so do any other infrastructure-based solutions. You also can’t use it to do live migration of snapshots from one server to another, called send/recv replication.

Btrfs is the Linux answer to the next-gen filesystem but it’s immature and not yet production ready.

Veritas does snapshots, but while it’s mature and stable, it’s very expensive.

This leaves ZFS, which is mature, stable and fast, and which allows you to send incremental changes between snapshots from one server to another. The only thing holding it back from mass adoption is the a lack of a performant Linux kernel port. But ZFS for Linux is coming in December. I’ve tested the beta, and it’s promising.

Here’s an example of how to do an incremental send and receive of a snapshot with ZFS to keep a slave up-to-date with the filesystem on a master.

Slide 7

Worked example of incremental ZFS replication

We create a zfs filesystem called “bucket1″. We put some data into that filesystem and then we snapshot it.

Then we send the first snapshot in full over to the slave which receives it and saves it to disk.

Then we change some bytes in the data on the master, snapshot the filesystem again, and send an incremental diff over to the slave.

This means that only the blocks that have changed get sent from one machine to another, so it’s very efficient.

Slide 8

We’re doing some cool stuff with this incremental zfs replication. We’ve built an asynchronously replicated cluster filesystem on top of it and we’re using that to build web clusters which have these nice properties. You can kill any machine safely in the knowlegde that a 10-second old backup of all its data will be stored safely across the cluster. By mounting many snapshots read-only, you can get horizontal scalability for read-heavy loads. And by picking the latest snapshot and stashing any others after a netsplit, you gain partition tolerance.

Furthermore, the incremental snapshots trick lets us automatically bring offline machines up to date from any timestamp, efficiently sending only the data which has changed between the time the machine went offline to when it came back.

In conclusion, ZFS let’s you do all this, it already runs on FreeBSD (our primary platform) and it’s coming to Linux in December, so check it out.

Slide 9

Thanks!

Follow us on Twitter: @hybridcluster / @lmarsden

Native ZFS on Linux, GA in December 2010: zfs.kqinfotech.com

Hybrid Web Cluster goes multi-data-centre with awesome redundancy, scalability and manageability.

Saturday, May 8th, 2010

I just thought I’d share some internal Hybrid Logic correspondence with the world, as it’s pretty exciting…

From: Luke
To: Mike, Rob, Kieran

Hey guys,

I’ve successfully set up our first cross-datacentre web cluster today!

Three servers are in our rack at Telehouse and three are hosted with ElasticHosts (also in London). The most impressive demo of course is turning off an entire data centre, which I just simulating by doing a hard power-off on all three ElasticHosts machines simultaneously *just* after updating a WordPress blog whose database was being hosted there.

I held my breath, counted to ten, clicked refresh, and voila, my latest post was still there :D

I’ve also got real failure-resistant DNS set up, no more /etc/hosts hacks. All websites resolve to all six IP addresses (with the current live one first in the list). All modern web browsers fail-over to a new IP within one second of the current IP failing. This is serious redundancy which you usually have to pay a lot of cash for.

More good news – ElasticHosts have just announced a new data centre in San Antonio, USA. Imagine a “Prefer my site to be hosted in the US” checkbox in our control panel… that’s gonna be a reality.

It’s looking good!

Cheers,
Luke

Mike replied with a few questions, which I answered:

I just had a thought. Does this also make it an awesome migration tool? For example if I wanted to move all my client sites to our own web cluster it would be an admin nightmare. However if we used HL to add web cluster machines to our existing network could we then quietly turn off our existing servers and our world keep spinning?

If you’re trying to move from one data centre where the web cluster software can be installed at both then yes, it’s a life-saver. Even if you had to move physical hardware from one location to another, you could do it by simply unplugging one server at a time (everything carries on working), moving them from A to B (also one at a time), and telling the web cluster where to look. The transition would look like this:

(Site A – old data centre, site B – new data centre, each * represents a server)

        A * * * * *    B
        A * * * *      B *
        A * * *        B * *
        A * *          B * * *
        A *            B * * * *
        A              B * * * * *

And voila, you’ve seamlessly *moved data centres* without a minute’s downtime for any of your sites.

It doesn’t solve the initial problem of moving all my client sites to the new software infrastructure. That will be done manually, and gradually, site-by-site. With the new DNS system now working you’d set up a bunch of nameservers, say ns1.tpj-cloud.com, ns2.tpj-cloud.com which would be hosted by the web cluster itself, and then you’d migrate the sites one-by-one to your new web cluster, at which point they gain the ease of transferability between data centres.

Whether your existing dedicated server host will support FreeBSD is another question… there’s always the Depenguinator!

Also would I be right in assuming that all machines in a web cluster have to have all applications installed on them? Sorry if that’s a Noddy question.

Not at all, it’s a good question. Let me try and explain how it works:

Not every site needs to be copied onto every single server. Suppose you have ten servers and you want a “redundancy guarantee” that you could turn off *any four servers* and not lose any data (perhaps the maximum number of servers is any single data centre is four, and you want to be data-centre-failure-proof).

What this means is that every website has to be copied onto 5 servers — consider the worst case for some website X, if four of the five servers which had copies of X went down, there’d always be one server left with that data. This is called N+1 redundancy. N is the level of redundancy (or acceptable risk), and N+1 is the number of copies of each piece of data (website, database) you need at minimum to have to guarantee that level of redundancy.

This number N, the redundancy level, is a knob you can tweak within the web cluster. It represents a trade-off between disk and bandwidth usage (replicating websites to lots of servers takes more network and I/O bandwidth) and the redundancy of your web cluster (its resistance to failure). If you want your web cluster to be nuclear-bomb-proof then you need N to be as high as the maximum number of machines in any one city (assuming only one city gets hit at a time!). But if you’re less paranoid about mutually assured destruction and want to make better use of your bandwidth and disk space, you might set your level of acceptable risk equal to *the probability of any two machines failing simultaneously*, you can set N=2. Then there’ll always be three copies of every single piece of data, so that you can’t possibly lose it when two machines go down.

Note that setting N to be a small constant number with respect to the number of machines in your web cluster enables the scalability of your web cluster. If every website only has to be replicated to 2 machines in a 100 node web cluster, you get 50 nodes’ worth of disk space and capacity to play with. When you scale that same web cluster to 200 machines, you get 100 nodes’ worth of disk space. In other words, you get the holy grail of linear horizontal scalability, while retaining your chosen level of redundancy.

At the other extreme, if every website has to be replicated to 50 machines, you only get 2 nodes’ worth of disk space to play with, but you do have an incredibly resilient system!

Any questions? Just shout in the comments and I’ll get back to you!