occasionally useful ruby, ubuntu, etc


ACID Cloud databases to keep an eye on

So far, "database in the cloud" options have been pretty limited. You have Amazon's SimpleDB, which is slow, eventually consistent, and not really a database except in the loosest sense of the word; there's Amazon RDS, which is a managed MySQL instance and scales up, but not out; and of course you can run database server yourself on any of the cloud servers out there (Amazon EC2, Rackspace, GoGrid, Linode, Slicehost, etc etc). But none of these offer one-click scale out ability, which is really the hardest part of database scaling. However, there are a few options cropping up, most of which are in beta.

xeround logoXeround provides a hosted solution (on EC2) based on MySQL. Instead of MyISAM or InnoDB as the engine, the Xeround engine is used. It's closed beta, but you get 3 instances when you sign up for the beta. The fine print says you only get to use the service for 15 days, but I don't see this mentioned anywhere else so I'm not positive that's the case (I only signed up today). You can scale up one host at a time, which takes a few minutes (3-5 or so). Databases are kept completely in memory, but can be optionally written to disk (I presume asynchronously). There doesn't appear to be any WAN support (that is, replicating across data centers), yet.

There is no notion of master/slave with Xeround -- all can write, and quorum is used to resolve read conflicts.

scalebase logo
I'm not in on the ScaleBase beta, so I can't describe it as well, but it's a "extremely sophisticated load balancer" according to them -- you can put your database servers pretty much anywhere (well, EC2, RDS, or on site...) and the load-balancer can auto-scale the backend. As a plus, it works with more than just MySQL, it works with Oracle, DB2, and SQLServer.

The way the scalability works is via auto-sharding (and so is shared-nothing) -- the load balancer spreads out the data on the underlying hosts automatically, and then aggregates the data as necessary on each request, giving the system linear scalability. (I hope the load balancer automatically scales!). Each piece of data is stored on a primary host and a backup host.

It sounds like a bit more work if you don't already have a working DB solution; it's not like Xeround where they actually give you hosts. Though the added control/flexibility isn't necessarily a bad thing, of course! You're on your own for upgrading DB software, for instance. I'm also not sure how performance is achieved -- sending data from your server, to a load balancer, which then goes to your databases...I presume you can somehow install the load-balancer close to the database, but this isn't revealed this side of the beta signup wall.

scaledb logo
ScaleDB is another MySQL-based solution that provides ACID in the cloud. Here's their nice bulleted list:

  • Large data sets
  • Large numbers of concurrent users
  • Large numbers of tables with complex relationships (e.g. using joins, materialized views, etc.)
  • ACID compliant transaction processing
  • Load balancing (e.g. to address temporal shifts in usage patterns)
  • High-availability with smooth fail-over
  • An evolving application with changing data storage requirements
  • Lower Total Cost of Ownership (TCO)

Instead of shared-nothing (where there is a single server for each record, excluding backups), the system uses shared-cache clustering, which utilizes redundant nodes to improve performance (and provide failover).

Another novel idea that ScaleDB implements is "multi-table indexes", which...is just what it sounds like: indexes that can span multiple joins. ScaleDB claims that this enables the system to execute "multi-tables joins with single-table performance".

I'm not exactly sure how one goes about expanding capacity -- in one section they claim ScaleDB gives you "Simple Plug-and-Cluster expansion", but on the FAQ it mentions ScaleDB offers "High-availability through master-master clusters". So...it sounds like it takes the pain out of expanding reads, but writes could still be a bottleneck (with only two masters).

nimbusdb logo
Not much is known about NimbusDB, except for a few bullet points here. Of particular notice is that it sounds like design took into account extreme flexibility, enabling data to be shared across multiple clusters in "remote locations", or even split a database between a data center and a public cloud service, if you're so inclined.

Okay, so why should you care? Well, if you're one for credentials, the CTO is Jim Starkey, who founded the Interbase database system, conceived the date and BLOB data types, and might have come up with the MVCC system (heh, some dispute on that one).

voltdb logo
VoltDB is a bit different in that you can run the software yourself, on your own hardware. If you took a regular RDBMS, pushed all of the tables into memory, then stripped out everything else for speed, you'd probably end up with VoltDB. It can also asynchronously write out data to disk for recovery scenarios. It only supports a subset of SQL 99, for now, but I imagine they're working on adding more support.

General commentary

I've only use Xeround, and only then for about 15 minutes, but it's good to see that there will be options maturing over the next few months. One thing amused me in particular: every database here that has an actual full website (that is, all but NimbusDB) has white papers, or documents masquerading as white papers at least (some read like advertisements). But generally when I pick a technology stack to use, I don't go looking for white papers, but if producing them becomes a trend, well, I might have to just start reading them. And the award for most white papers goes to...ScaleDB, for not one, not two, but 14 white papers.

My goal here isn't to provide a well-rounded or even fair (but hopefully accurate, heh) assessment of these technologies, but rather to pique your interest. If you find something particularly special about one of the databases (say, in one of the many white papers) I'd love to hear about it in the comments.

Filed under: web 2.0 Leave a comment
Comments (5) Trackbacks (0)
  1. hi, do you know any other shared disk/shared everything database but oracle rac? thx a lot!!

    • Hmm, not off-hand, but xeround with k = n-1 (where k is the # of copies and n is the total number of nodes) would keep a copy of each record in the memory of each node. Not shared disk, for sure, but I’m not sure exactly what you’re looking for, either. ScaleDB might also be capable of this, since they advertise redundant nodes. Sorry, not super familiar with Oracle technologies, least of all the Oracle-exclusive ones like RAC.

  2. Some other offerings I didn’t see before: http://akiban.com/ , http://www.dbshards.com/dbshards/ , http://www.translattice.com/ , and http://www.database.com/ . Not all are “cloud” solutions, but are still potentially relevant.

Leave a comment

No trackbacks yet.