Archive for the ‘Web cluster’ Category

Hybrid Cluster — self-healing, auto-scaling & very forgiving

Wednesday, December 28th, 2011

You won’t have heard much from Hybrid Logic recently — now with an early stage tech company this can mean one of two things: either they’ve given up and gone home, or they’re mad busy innovating, building and shipping their product.

I’m pleased to report that in our case it’s the latter ;-)

Hybrid Cluster has had an extraordinary year of development and we’re on the cusp of releasing some very exciting new features for the world to get to grips with. What we’ve done is nothing short of revolutionary — we’re changing the fundamental assumptions about how your servers can co-operate together, how applications and databases can scale, and how companies do business continuity planning across data centres.

In the “old world”, a server is seen as a single entity; one which has its own specific configuration, and which hosts a set of applications and databases. If you’re staying up-to-date with the industry, you’ll have virtualized that server and put its storage in a centralized storage system (a SAN, for example) — now that’s all very well, but the virtual server is still conceptually a single server and can still suffer from these three problems:

  1. Hardware and networks fail
  2. Servers get over-loaded when there are spikes in demand
  3. Users make mistakes

At Hybrid Logic it’s our mission to solve all three of these problems for your existing LAMP applications, and our software — available for license today — solves them by employing a fundamental paradigm shift in industry thinking.

Individual servers and storage systems should not be the unit of concern for you, the developer or administrator. Applications, databases and mailboxes should be — the servers should look after themselves.

Now, if you look a little further down the road, this is the way the industry’s moving — in cloud, the move from IaaS to PaaS is exactly this — developers and sysadmins should not have to think about individual server instances ever again. Their servers should form a cognizant co-operative group on their own. This is exactly what our software does — it transforms a bunch of dumb, commodity machines, connected by slow and unreliable network connections, into a loosely-coupled distributed cluster where the failure of an individual server or even an entire data centre is automatically healed so that the cluster carries on working — keeping your applications, databases and mailboxes online even in the face of catastrophic failure of an entire region.

I’m Luke, the CTO here at Hybrid Logic, and in the next few blog posts I’m going to give you a bit of insight into how we do it ;-)

Happy New Year!

Cheers,
Luke

Hybrid Cluster multi-region branch

Thursday, August 18th, 2011

We’re very excited to announce the imminent release of our multi-region branch. In our labs we now have one cluster spanning East & West coast USA on cloud infrastructure and dedicated hardware in Europe, with densities of over 1,000 websites per node.

More to come ;-)

More performance and stability improvements

Sunday, December 5th, 2010

Hello everyone,

We’ve been working hard over the weekend and have some good core cluster stability and performance improvements to show for it, new internal performance testing tools and a sneak peek of our Hybrid Sites project:

First, a bug in Twisted which was causing the distributed proxying layer to sometimes stop accepting new requests has been worked around. This means you shouldn’t see database connection errors any more. If you do, please report them by posting on the forums!

Second, we’ve now got a new internal performance testing tool which shows us a scatter graph of a cluster’s response time and stability:

This plot shows, for example, an average response time of around 400ms for a WordPress blog and our Control Panel (the CP in red, WordPress in blue). The few outliers show latencies of up to 10 seconds when a server fails! Much better than the usual hours or days of downtime!!

Next, I’ve got a couple of sneak peaks of our Hybrid Sites website and Control Panel. This will be our flagship cloud web hosting platform, perfect for developers, designers and publishers alike.

And here’s the control panel, showing off our whitelabel features:

A massive amount of under-the-hood work has gone on with the Control Panel in readying our incredibly powerful reseller system for Hybrid Site’s go-live in a couple of weeks.

We are also happy to announce that Hybrid Sites will be launched in association with ElasticHosts, giving their customers access to powerful, simple cloud web hosting — a much easier option than setting up their own Linux box over SSH. Hybrid Sites will in fact be launched across multiple cloud providers, including CloudSigma, to provide impressive cross-cloud redundancy.

We also have a reseller API which presently has 35 commands and growing. This will allow you to set up reseller accounts, take payments, set up websites, databases and purchase domains all through a powerful REST API. This will come hand-in-hand with the WordPress plugin which runs our — or your — frontend web hosting company page. Hybrid Sites will be the first to prove this technology tool-chain :-)

Lots more to come this week, but for now, please try to thrash the knackers off your beta clusters, and get in touch if you want to test drive it and you’re not on the beta yet!

Cheers!

Luke Marsden, CTO

Running FreeBSD 8.1 as a Xen HVM DomU on Flexiant

Friday, November 26th, 2010

Just thought I’d share the incantations which were necessary to get FreeBSD 8.1 XENHVM kernel to work well on Flexiant, with paravirtualised network and disk:

Just before the kernel boots (which you have to be quick to catch with Flexiant’s VNC client) hit F6 on the bootloader and type:

set hw.clflush_disable=1
boot

This will allow you to boot even a GENERIC kernel. Once you’re booted, chuck hw.clflush_disable="1" into /boot/loader.conf to make this permanent.

You’ll then want to build your own kernel for paravirtualised network and disk drivers. Edit the XENHVM kernel config (see the FreeBSD handbook on compiling your own kernel) – comment out the MODULES_OVERRIDE line which disables building all the modules (assuming you want ZFS support) and also comment out the whole section about WITNESS and INVARIANT, as having this enabled will slow down your kernel quite significantly.

Then you’ll need to patch the network driver as per this post (manually, since the code has changed a bit), else you get a lot of dropped packets:

http://www.mail-archive.com/freebsd-xen@freebsd.org/msg00598.html

Then just (as root):

cd /usr/src
make buildkernel KERNCONF=XENHVM
make installkernel KERNCONF=XENHVM
shutdown -r now

And enjoy your speedy FreeBSD 8.1 VM in the cloud!

Hybrid Web Cluster now runs on Xen HVM IaaS providers

Friday, November 26th, 2010

The proof:

Lots more IaaS providers, here we come!

FTP support now in Hybrid Web Cluster

Tuesday, November 23rd, 2010

Update: This has now been deployed to your beta web clusters! And check out the new video below…

We now have clustered FTP support! This means you can open an FTP connection to any node on the cluster at any time, and you’ll get internally redirected to the correct backend server for that site, and authenticated against the password you’ve set up in your Control Panel.

FTP is one of the most annoying and broken protocols on the planet (a separate data connection, whyyy?), but it’s also crucial for uploading your website — so it’s rather good that we support it now :-)

This inherits the same nice properties which HTTP and MySQL requests enjoy while traveling through our distributed proxying layer, such as stopping site-juggling from occurring while an FTP connection is happening (so your site won’t get load-balanced if you’re uploading to it). It also means that if your site is being juggled at the same moment you connect to it, you’ll get a slight delay before your FTP connection is established, rather than an error message.

We are nearly feature complete!

Stability, performance and rendering improvements

Friday, November 19th, 2010

It’s been a busy first week of the beta here at Hybrid Logic HQ, and we’re very pleased by the response we’ve had to the start of the beta — thank you! There’s been a buzz of activity on the forums and we love it when you give us feedback, so please carry on experimenting with our software and tell us what you think.

Along with the awesome feedback from yourselves, which we taking careful note of, we’ve also been doing some improvements of our own. Here’s a quick breakdown of the fixes and improvements which have now been deployed across all your clusters:

  • At the deployment level, we can now add new instances to an existing cluster. This means we can unintrusively upgrade a cluster to include new or better spec machines (irrespective of physical location) so that you can scale your hosting operation seamlessly.
  • The “God Pod” has had some significant responsiveness improvements. When we first launched, it wasn’t the most responsive user experience in the world. It’s much quicker and more accurate now, so give it a go!
  • We’ve made significant improvements to the stability of the core web hosting platform. We’ve solved several problems which were causing “Default site on X” error messages where your websites should have been. Another bug was causing databases to sometimes become inaccessible, and we’ve solved that too. Stability is looking a lot better.
  • We’ve improved the intelligence of the core load balancing algorithms, meaning that the decisions to move a site from one server to another (due to load) is now a fair bit smarter, and you should see fewer unnecessary load balancing events. As ever, there’s still room for improvement.
  • We’ve enabled swap on all your machines, so that if your 1.4GB memory does ever get fully used up, your instances will just become slow for a few minutes as they recover, rather than falling over or crashing completely.
  • When a site is about to be moved from one server to another, what happens internally is that requests for that site get “paused” by the distributed proxying layer which runs on top of the web and database servers. This pausing happens so that during the transfer of the site or database from one server to another, none of the requests return error messages — rather, the user just experiences a slow page load. The Load Balancing Diagram in the God Pod now shows a dotted line around a site when it is paused. This gives you a better insight into what’s happening within the cluster during the process of moving sites from one server to another to keep your servers healthy and balanced.
  • Performance has been improved massively. Previously, load balancing events caused sites to be blocked for up to 20 seconds. We’ve managed to get this down to 3-6 seconds in most cases, resulting in fewer requests building up. We’ve also made some code changes which have made everything feel a lot snappier. We will be continuing to optimise for performance over the coming weeks and months — this is only the start!
  • Numerous tweaks and improvements to functionality in the Control Panel have also been deployed (more details on this will be posted to our forum in due course).

We can’t wait to see how much better we can make it next week!

The God Pod is ready…

Friday, November 12th, 2010

Watch this space.

Beta Programme starting this week!

Monday, November 8th, 2010

Just sent this out to our elite team of beta testers:

We are very pleased to announce that we will be spinning up your beta cluster during the course of this week!

We are staggering the release of the beta clusters so that we can give everyone some personal attention. If you haven’t received your login details by the end of the week, don’t worry, we will be working on it. We expect to have all the clusters up and running by next Wednesday at the latest.

We are also putting together a video walkthrough which we’ll be launching this Wednesday. This will give you a good overview of the whitelabel cloud deployment technology that we’ve been building as well as what you can do with our web hosting platform.

It’s a beta preview, not the final version

As this is a beta preview, not everything is finished and fully working, but we do have more than the basics in place. You will be able to set up WordPress blogs with one click and upload your own PHP/MySQL websites via FTP. You can set up fully-replicated databases through our Control Panel. You will also have your own set of nameservers so that you can try pointing any real live domains you wish at your web cluster — although we’d recommend not deploying your company’s live website to your test cluster just yet!

We also have a helpdesk and billing system which you’ll have a chance to get to grips with. You’ll also be able to add web hosting reseller users with their own logins and change the colour scheme and header image for complete white-label brandability, either globally for your whole cluster or on a per-user basis.

And here’s the exciting bit: you’ll get a chance to play with our distributed load-balancing and failure tolerance algorithms in real-time. We have a really slick web interface in the works for viewing the live state of your cluster. You’ll also be able to drag up and down sliders to adjust the load (requests per second) on the different websites you’ve set up. And you’ll be able to pull the plug on a server and see that within seconds, the cluster reconfigures itself so that all your websites stay online. You can then turn the “failed” server back on and watch how the cluster redistributes load to it when it recovers.

But it’s not completely finished: it won’t do email (yet), it won’t do SSL (yet), and it might not always look gorgeous. It will definitely be a bit rough around the edges. This is where we need your help.

How you can help us during the beta

During the beta programme, we’ll be working flat-out to react quickly to any problems or issues you report. We’ll be pushing a lot of code updates and adding features in response to your feedback from one day to the next. This is a crucial part of our development process and we’re excited to have you on board.

Every element in our Control Panel has a flag icon next to it, which you can click to tell us what you think of it. If anything breaks please tell us by clicking on the flag and typing a quick description of what happened. This will automatically raise a ticket in your cluster’s helpdesk, which will be configured to notify us here at Hybrid Logic HQ.

We’ll also be launching a public forum this Wednesday. We really want to foster a community around the beta so please do sign up when we send you the link. During the beta programme, you’ll be able to contact us by clicking the flags, by logging on to the forum (our preferred way for us to discuss feature requests), or we can provide email, phone or Skype support. We’ll send you all the contact details at the same time that we send round your initial Control Panel logins.

It comes in two parts

You’ll actually get login credentials for two systems: your cluster control panel (the CP), and our master control panel (the Metapanel). If you’ve ever bought a dedicated server, you’ll be familiar with the concept: your cluster control panel is hosted on your cluster and lets you add users, websites and databases. The master control panel is where you log in to manage the cluster itself. The way I like to think of it is that the Control Panel is looking at the cluster from within, while the Metapanel is looking at the cluster from above.

We’ll give you full administrative access to your CP and a regular user account on the Metapanel. Both will be fully skinnable and support setting up reseller accounts. This should give you some idea of the possibilities for reselling entire clusters to your customers as well as reselling cloud web hosting on your own cluster. This may be of particular interest to you IaaS guys out there.

We’ll email you again mid-week with an update on our progress and the video for you to watch. If you’ve got any immediate questions, feel free to hit reply to this email.

Thank you for getting involved. We couldn’t do it without you.

Lightning talk at CloudCamp London

Thursday, October 21st, 2010
For those who missed it, here’s the text of my talk at CloudCamp London yesterday. CloudCamp was great fun, thanks Chris!

Slide 1

Hi, I’m Luke from Hybrid Logic and I’m going to talk about filesystem snapshots and how they are useful in cloud computing.

Slide 2

A snapshot is an instantaneous point-in-time copy of your filesystem. The blocks that haven’t changed aren’t needlessly copied so you can store lots of snapshots with less disk space than you’d expect.

What are snapshots good for? Well, have you ever deleted important files by accident? Keeping snapshots lets you quickly “roll back time”.

Also, if you can copy your snapshots onto a different server, they can act as a great backup which you can recover very quickly from.

Cloud instances aren’t perfect, and data loss/instance failure in not un-heard-of in public clouds. Whole industries have grown up around dealing with the transient, ephemeral nature of cloud instances.

Being able to take a snapshot of your server and clone it brings a new level of manageability as well. If you’ve ever started up an EC2 instance, then you have – perhaps unwittingly – cloned a snapshot of a disk image.

Slide 3

The cloud storage model

Infrastructure is the underlying compute hardware, whether real or virtualised. With respect to storage, the infrastructure corresponds to the block device exposed by, say, EBS on EC2, or the physical hard disk in a non-cloud data centre.

The platform includes the Operating System and crucially the Fileystem which you choose to install on your cloud instances.

My claim is that it’s better to have the snapshotting done at the filesystem level, than to rely on the underlying infrastructure’s snapshotting capabilities, if they exist at all.

Slide 4

The primary benefit of doing this is the removal of vendor lock-in. By having snapshots at the platform level you can replicate data between servers in entirely different cloud infrastructures, for example, you can move data between EC2 to ElasticHosts and back again. Plus you can move snapshots in and out of the cloud entirely, allowing you to build hybrid clouds without expensive, complex virtualisation in your own data centre. In total, this reduces your dependence on any one provider, which reduces your risk of downtime.

Slide 5

Relying on infrastructure for your snapshots brings some other problems too. When you take a snapshot with something like EBS, because the infrastructure can’t communicate “up” to the platform, it has no way of telling the filesystem that the snapshot is about to happen. If the filesystem is mid-way through a write when the snapshot takes place, you’ll end up with a corrupt snapshot.

One solution is to use a “pausable” filesystem, such as XFS, so you can flush it to disk and block the flow of writes during a snapshot. But because you require interaction between the two different layers, the process of pausing the filesystem and taking the snapshot can take a long time, which has been known to crash MySQL.

ZFS allows the unification of these layers. By some Linux kernel hackers this has been described as a “rampant layering violation” but I prefer to think of it as a elegant refactoring, because in fusing these two layers together ZFS becomes faster and smarter, guaranteeing O(1), consistent filesystem snapshots.

Slide 6

Comparison: filesystems with snapshots

XFS on EBS gives you vendor lock-in and so do any other infrastructure-based solutions. You also can’t use it to do live migration of snapshots from one server to another, called send/recv replication.

Btrfs is the Linux answer to the next-gen filesystem but it’s immature and not yet production ready.

Veritas does snapshots, but while it’s mature and stable, it’s very expensive.

This leaves ZFS, which is mature, stable and fast, and which allows you to send incremental changes between snapshots from one server to another. The only thing holding it back from mass adoption is the a lack of a performant Linux kernel port. But ZFS for Linux is coming in December. I’ve tested the beta, and it’s promising.

Here’s an example of how to do an incremental send and receive of a snapshot with ZFS to keep a slave up-to-date with the filesystem on a master.

Slide 7

Worked example of incremental ZFS replication

We create a zfs filesystem called “bucket1″. We put some data into that filesystem and then we snapshot it.

Then we send the first snapshot in full over to the slave which receives it and saves it to disk.

Then we change some bytes in the data on the master, snapshot the filesystem again, and send an incremental diff over to the slave.

This means that only the blocks that have changed get sent from one machine to another, so it’s very efficient.

Slide 8

We’re doing some cool stuff with this incremental zfs replication. We’ve built an asynchronously replicated cluster filesystem on top of it and we’re using that to build web clusters which have these nice properties. You can kill any machine safely in the knowlegde that a 10-second old backup of all its data will be stored safely across the cluster. By mounting many snapshots read-only, you can get horizontal scalability for read-heavy loads. And by picking the latest snapshot and stashing any others after a netsplit, you gain partition tolerance.

Furthermore, the incremental snapshots trick lets us automatically bring offline machines up to date from any timestamp, efficiently sending only the data which has changed between the time the machine went offline to when it came back.

In conclusion, ZFS let’s you do all this, it already runs on FreeBSD (our primary platform) and it’s coming to Linux in December, so check it out.

Slide 9

Thanks!

Follow us on Twitter: @hybridcluster / @lmarsden

Native ZFS on Linux, GA in December 2010: zfs.kqinfotech.com