Monday, August 30, 2010

Innovation in the MySql market

The previous post might have made you think I don't care much for MySql, and you'd be right. But there is some interesting innovation going on in the MySql world - it's just not in MySql itself.

Drizzle looks like it could be to MySql what FireFox was to Mozilla - a cleaned-up, stripped-down version that improves all the good stuff and leaves the dreck behind. Here's a terrific talk by Brian Aker (former architect for MySql and one of the movers behind Drizzle) that made me want to go out and try Drizzle in spite of my aversion to MySql.

Infobright is one of the new breed analytic DBMS's. It's build on MySql, but uses its own column-oriented storage engine, data compression, and optimizer. It looks interesting, and I see that one of their technical advisors is Roger Bodamer, who I worked for at OuterBay Technologies - a startup that was bought by HP in 2006. Roger is also the SVP of Engineering at 10gen, the company behind MongoDB - so it looks like he's sitting on both sides of the new SQL/NoSql fence. Smart guy.

Big Changes in the Data Warehouse Space

It's been about 2 1/2 years since I looked really seriously at the data warehouse/BI space. At the time, things seemed very stagnant - there were the big 3 or 4 vendors (Oracle, IBM, Microsoft, maybe Teradata) and a host of smaller players who either were getting gobbled up (Hyperion, Business Objects, Cognos, etc.) or spiraling downward (Sybase, MicroStrategy).

In fact, this seemed to be true of the DBMS space as a whole. A market where the trendiest and most disruptive technology is MySql is not an interesting market.

Turns out that wasn't the case at all. Lots of interesting things were bubbling beneath the surface then and have been emerging ever since. The big news in the software development world is NoSQL/MapReduce/Hadoop, because it's been about a decade since Object-Oriented databases failed and so it's time for another round of "relational databases suck and here's what's going to replace them". (Actually, I like the whole NoSQL thing. But it's not a replacement for DBMS's - instead, it's a big part of the solution for a new class of data storage and management problems. But I digress).

What I think is more interesting is the new breed of analytic DBMS engines - Vertica, GreenPlum, AsterData, ParAccel, VoltDB, InfoBright, Netezza - and the attempts by the established vendors to keep up (Oracle Exadata, and Microsoft's acquisition of DATAllegro and subsequent release of SQL Server 2008 R2 Parallel Datawarehouse). Weave in threads like the re-birth of the database appliance, the emergence of Solid-State Disk as a mainstream technology, and the adoption/adaptation of MapReduce by some of the above vendors, and now we've got something really interesting.

What's interesting about the technology of the new breed is the way different architectural trends are being synthesized together. Massively Parallel Processing (MPP) and shared-nothing architectures are nothing new in the database world, but here they've been strongly influenced by the architecture of scaling out on commodity hardware that owes more to Google and the other big web sites. Columnar storage, compression, and tiered storage schemes aren't new ideas either, but combining them together, especially in the context of a database appliance and/or the cloud is a big step forward.

The other really interesting development is a new approach towards analytics, known as "MAD" - Magnetic, Agile, and Deep. I won't pretend that I understand what it really means yet, other than to note that this is another area of software development where dissatisfaction with the status quo has resulted in a new wave of people and products who have adapted an Agile world-view.

I think EMC's acquisition of GreenPlum is an inflection point that, along with the success of Vertica and Aster Data (and the large installations that all three of them now have), legitimizes the new breed to the mainstream datawarehouse customer.

As an aside, it's worth noting that a lot of the interesting stuff going on can be traced either directly or indirectly to Mike Stonebraker. Vertica and VoltDB are his latest projects, GreenPlum and AsterData are built on top of Postgres, ParAccel also uses some Postgres technology, and DATAllegro was originally built on top of Ingres.