Archive for the ‘Multi-data-centre’ Category

Hybrid Cluster — self-healing, auto-scaling & very forgiving

Wednesday, December 28th, 2011

You won’t have heard much from Hybrid Logic recently — now with an early stage tech company this can mean one of two things: either they’ve given up and gone home, or they’re mad busy innovating, building and shipping their product.

I’m pleased to report that in our case it’s the latter ;-)

Hybrid Cluster has had an extraordinary year of development and we’re on the cusp of releasing some very exciting new features for the world to get to grips with. What we’ve done is nothing short of revolutionary — we’re changing the fundamental assumptions about how your servers can co-operate together, how applications and databases can scale, and how companies do business continuity planning across data centres.

In the “old world”, a server is seen as a single entity; one which has its own specific configuration, and which hosts a set of applications and databases. If you’re staying up-to-date with the industry, you’ll have virtualized that server and put its storage in a centralized storage system (a SAN, for example) — now that’s all very well, but the virtual server is still conceptually a single server and can still suffer from these three problems:

  1. Hardware and networks fail
  2. Servers get over-loaded when there are spikes in demand
  3. Users make mistakes

At Hybrid Logic it’s our mission to solve all three of these problems for your existing LAMP applications, and our software — available for license today — solves them by employing a fundamental paradigm shift in industry thinking.

Individual servers and storage systems should not be the unit of concern for you, the developer or administrator. Applications, databases and mailboxes should be — the servers should look after themselves.

Now, if you look a little further down the road, this is the way the industry’s moving — in cloud, the move from IaaS to PaaS is exactly this — developers and sysadmins should not have to think about individual server instances ever again. Their servers should form a cognizant co-operative group on their own. This is exactly what our software does — it transforms a bunch of dumb, commodity machines, connected by slow and unreliable network connections, into a loosely-coupled distributed cluster where the failure of an individual server or even an entire data centre is automatically healed so that the cluster carries on working — keeping your applications, databases and mailboxes online even in the face of catastrophic failure of an entire region.

I’m Luke, the CTO here at Hybrid Logic, and in the next few blog posts I’m going to give you a bit of insight into how we do it ;-)

Happy New Year!

Cheers,
Luke

Hybrid Cluster multi-region branch

Thursday, August 18th, 2011

We’re very excited to announce the imminent release of our multi-region branch. In our labs we now have one cluster spanning East & West coast USA on cloud infrastructure and dedicated hardware in Europe, with densities of over 1,000 websites per node.

More to come ;-)

Hybrid Web Cluster now runs on Xen HVM IaaS providers

Friday, November 26th, 2010

The proof:

Lots more IaaS providers, here we come!

FTP support now in Hybrid Web Cluster

Tuesday, November 23rd, 2010

Update: This has now been deployed to your beta web clusters! And check out the new video below…

We now have clustered FTP support! This means you can open an FTP connection to any node on the cluster at any time, and you’ll get internally redirected to the correct backend server for that site, and authenticated against the password you’ve set up in your Control Panel.

FTP is one of the most annoying and broken protocols on the planet (a separate data connection, whyyy?), but it’s also crucial for uploading your website — so it’s rather good that we support it now :-)

This inherits the same nice properties which HTTP and MySQL requests enjoy while traveling through our distributed proxying layer, such as stopping site-juggling from occurring while an FTP connection is happening (so your site won’t get load-balanced if you’re uploading to it). It also means that if your site is being juggled at the same moment you connect to it, you’ll get a slight delay before your FTP connection is established, rather than an error message.

We are nearly feature complete!

Stability, performance and rendering improvements

Friday, November 19th, 2010

It’s been a busy first week of the beta here at Hybrid Logic HQ, and we’re very pleased by the response we’ve had to the start of the beta — thank you! There’s been a buzz of activity on the forums and we love it when you give us feedback, so please carry on experimenting with our software and tell us what you think.

Along with the awesome feedback from yourselves, which we taking careful note of, we’ve also been doing some improvements of our own. Here’s a quick breakdown of the fixes and improvements which have now been deployed across all your clusters:

  • At the deployment level, we can now add new instances to an existing cluster. This means we can unintrusively upgrade a cluster to include new or better spec machines (irrespective of physical location) so that you can scale your hosting operation seamlessly.
  • The “God Pod” has had some significant responsiveness improvements. When we first launched, it wasn’t the most responsive user experience in the world. It’s much quicker and more accurate now, so give it a go!
  • We’ve made significant improvements to the stability of the core web hosting platform. We’ve solved several problems which were causing “Default site on X” error messages where your websites should have been. Another bug was causing databases to sometimes become inaccessible, and we’ve solved that too. Stability is looking a lot better.
  • We’ve improved the intelligence of the core load balancing algorithms, meaning that the decisions to move a site from one server to another (due to load) is now a fair bit smarter, and you should see fewer unnecessary load balancing events. As ever, there’s still room for improvement.
  • We’ve enabled swap on all your machines, so that if your 1.4GB memory does ever get fully used up, your instances will just become slow for a few minutes as they recover, rather than falling over or crashing completely.
  • When a site is about to be moved from one server to another, what happens internally is that requests for that site get “paused” by the distributed proxying layer which runs on top of the web and database servers. This pausing happens so that during the transfer of the site or database from one server to another, none of the requests return error messages — rather, the user just experiences a slow page load. The Load Balancing Diagram in the God Pod now shows a dotted line around a site when it is paused. This gives you a better insight into what’s happening within the cluster during the process of moving sites from one server to another to keep your servers healthy and balanced.
  • Performance has been improved massively. Previously, load balancing events caused sites to be blocked for up to 20 seconds. We’ve managed to get this down to 3-6 seconds in most cases, resulting in fewer requests building up. We’ve also made some code changes which have made everything feel a lot snappier. We will be continuing to optimise for performance over the coming weeks and months — this is only the start!
  • Numerous tweaks and improvements to functionality in the Control Panel have also been deployed (more details on this will be posted to our forum in due course).

We can’t wait to see how much better we can make it next week!

Hybrid Web Cluster goes multi-data-centre with awesome redundancy, scalability and manageability.

Saturday, May 8th, 2010

I just thought I’d share some internal Hybrid Logic correspondence with the world, as it’s pretty exciting…

From: Luke
To: Mike, Rob, Kieran

Hey guys,

I’ve successfully set up our first cross-datacentre web cluster today!

Three servers are in our rack at Telehouse and three are hosted with ElasticHosts (also in London). The most impressive demo of course is turning off an entire data centre, which I just simulating by doing a hard power-off on all three ElasticHosts machines simultaneously *just* after updating a WordPress blog whose database was being hosted there.

I held my breath, counted to ten, clicked refresh, and voila, my latest post was still there :D

I’ve also got real failure-resistant DNS set up, no more /etc/hosts hacks. All websites resolve to all six IP addresses (with the current live one first in the list). All modern web browsers fail-over to a new IP within one second of the current IP failing. This is serious redundancy which you usually have to pay a lot of cash for.

More good news – ElasticHosts have just announced a new data centre in San Antonio, USA. Imagine a “Prefer my site to be hosted in the US” checkbox in our control panel… that’s gonna be a reality.

It’s looking good!

Cheers,
Luke

Mike replied with a few questions, which I answered:

I just had a thought. Does this also make it an awesome migration tool? For example if I wanted to move all my client sites to our own web cluster it would be an admin nightmare. However if we used HL to add web cluster machines to our existing network could we then quietly turn off our existing servers and our world keep spinning?

If you’re trying to move from one data centre where the web cluster software can be installed at both then yes, it’s a life-saver. Even if you had to move physical hardware from one location to another, you could do it by simply unplugging one server at a time (everything carries on working), moving them from A to B (also one at a time), and telling the web cluster where to look. The transition would look like this:

(Site A – old data centre, site B – new data centre, each * represents a server)

        A * * * * *    B
        A * * * *      B *
        A * * *        B * *
        A * *          B * * *
        A *            B * * * *
        A              B * * * * *

And voila, you’ve seamlessly *moved data centres* without a minute’s downtime for any of your sites.

It doesn’t solve the initial problem of moving all my client sites to the new software infrastructure. That will be done manually, and gradually, site-by-site. With the new DNS system now working you’d set up a bunch of nameservers, say ns1.tpj-cloud.com, ns2.tpj-cloud.com which would be hosted by the web cluster itself, and then you’d migrate the sites one-by-one to your new web cluster, at which point they gain the ease of transferability between data centres.

Whether your existing dedicated server host will support FreeBSD is another question… there’s always the Depenguinator!

Also would I be right in assuming that all machines in a web cluster have to have all applications installed on them? Sorry if that’s a Noddy question.

Not at all, it’s a good question. Let me try and explain how it works:

Not every site needs to be copied onto every single server. Suppose you have ten servers and you want a “redundancy guarantee” that you could turn off *any four servers* and not lose any data (perhaps the maximum number of servers is any single data centre is four, and you want to be data-centre-failure-proof).

What this means is that every website has to be copied onto 5 servers — consider the worst case for some website X, if four of the five servers which had copies of X went down, there’d always be one server left with that data. This is called N+1 redundancy. N is the level of redundancy (or acceptable risk), and N+1 is the number of copies of each piece of data (website, database) you need at minimum to have to guarantee that level of redundancy.

This number N, the redundancy level, is a knob you can tweak within the web cluster. It represents a trade-off between disk and bandwidth usage (replicating websites to lots of servers takes more network and I/O bandwidth) and the redundancy of your web cluster (its resistance to failure). If you want your web cluster to be nuclear-bomb-proof then you need N to be as high as the maximum number of machines in any one city (assuming only one city gets hit at a time!). But if you’re less paranoid about mutually assured destruction and want to make better use of your bandwidth and disk space, you might set your level of acceptable risk equal to *the probability of any two machines failing simultaneously*, you can set N=2. Then there’ll always be three copies of every single piece of data, so that you can’t possibly lose it when two machines go down.

Note that setting N to be a small constant number with respect to the number of machines in your web cluster enables the scalability of your web cluster. If every website only has to be replicated to 2 machines in a 100 node web cluster, you get 50 nodes’ worth of disk space and capacity to play with. When you scale that same web cluster to 200 machines, you get 100 nodes’ worth of disk space. In other words, you get the holy grail of linear horizontal scalability, while retaining your chosen level of redundancy.

At the other extreme, if every website has to be replicated to 50 machines, you only get 2 nodes’ worth of disk space to play with, but you do have an incredibly resilient system!

Any questions? Just shout in the comments and I’ll get back to you!