Archive for the ‘DynamoBI’ Category

NoSQL Now 2011: Review of AdHoc Analytic Architectures

Август 30th, 2011

For those that weren’t able to attend the fantastic NoSQL Now Conference in San Jose last week, but are still interested in the slides about how people are doing Ad Hoc analytics on top of NoSQL data systems, here’s my slides from my presentation:

No sql now2011_review_of_adhoc_architectures

View more presentations from ngoodman
We obviously continue to hear from our community that LucidDB is a great solution sitting in front of a Big Data/NoSQL system. Allowing easy SQL access (including super fast, analytic database cached views) is a big win for reducing load *AND* increasing usability of data in NoSQL systems.

PlanetMySQL Voting: Vote UP / Vote DOWN

Column Store 101

Февраль 1st, 2011

I’m often asked, as an initial question of why LucidDB can perform so much better than traditional row store databases like Oracle, SQLServer, DB2, MySQL is HOW?

There’s a bunch of reasons and we have entire sections of our Wiki dedicated to the topic, but first and foremost is corner stone of almost every analytic database.  We’ve changed the way we orient the data on disk.  I’m not going to go into too much detail here, but I think as a start, a very simple example can help gain a little more understanding about how we can deliver an order of magnitude better performance over a row store.  More to come, as well.

Take the following query that is typical of BI workloads (we’re using commercial flight data available from the BTS):

select count(Carrier), Carrier, DayOfWeek, CancellationCode
group by Carrier, DayOfWeek, Cancellation

This query is typical – we want some aggregation (count of flights) grouped by several qualifying attributes (Carrier, DayOfWeek, CancellationCode).  We need to examine (in our example dataset) approximately 63 million records to answer this question.  In other words, we’re looking at all the data in a table to answer the query.

In a row store, even with indexes, a query that needs all the data from the database needs to touch all the data in the table.  This means THE ENTIRE TABLE is accessed from an I/O perspective.  Even though only a few columns or bits of data might be used, the ENTIRE row (that contains all the column data) is accessed off disk.

Let’s take our 63 million record table, that has approximately 99 columns.  Assuming (and this is bad assumption) that the row store takes the exact same amount of storage as a column store, the total table size is 3,025 MB.  In the row store, the data stored in rows is stored in blocks and is relatively uniform (ie, approximately 1000 rows / blocks and stored in 63,000 blocks).  In a column store, we store the columns separately.  Our storage for the same 3,025 MB breaks down like this (total is still 3,025 MB).

NOTE: Sorry for the graph labels! Sort of jumbled!

As you can see, some columns still continue to take up a fair amount of space (120MB +) but other columns are much much smaller.  The smaller columns are ones where values are repeated often (Year, Carrier, etc).

Here’s the gist.  Remember our SQL query typical in BI systems?

select count(Carrier), Carrier, DayOfWeek, CancellationCode group by Carrier, DayOfWeek, Cancellation

The SQL statement only access 3 columns (Carrier, DayOfWeek, CancellationCode).

In a Row Store, the database has to read the ENTIRE table from disk (3025 MB)

In a Column Store, the database only reads the columns it needs from disk (44 MB)

The column store is doing almost 1/100th of the work of the row store!  44 MB vs 3025 MB!

It isn’t magic.  It isn’t some magical breakthrough in CPU technology.

We’ve simply changed how we’re storing the data on disk so that asking the same question (SQL) on the same data (63 million rows) does far less work (44 MB vs 3025 MB) to give the same answer.

I’ll follow up with more on the this and other topics, but I hope this helps explain a very very basic reason of how LucidDB can deliver such fantastic improvements over otherwise very well performing OLTP databases like Oracle.


PlanetMySQL Voting: Vote UP / Vote DOWN

LucidDB has a new Logo/Mascot

Сентябрь 2nd, 2010

At yesterdays Eigenbase Developer Meetup at SQLstream’s offices in San Francisco we arrived at a new logo for LucidDB.  DynamoBI is thrilled to have supported and funded the design contest to arrive at our new mascot.  Over the coming months you’ll see the logo make it’s way out to the existing luciddb.org sites, wiki sites, etc.  I’m really happy to have a logo that matches the nature of our database - BAD ASS!


PlanetMySQL Voting: Vote UP / Vote DOWN

LucidDB: DynamoBI is running with it

Октябрь 24th, 2009

I can think of no better analogy than that of a multi leg race. You know, the races where one sprinter runs as fast as they can, before passing the baton to the next sprinter.

200910240932

First it was Broadbase.
Second it was LucidEra.
Third it was Eigenbase / LucidEra / SQLstream (joint development w/ Eigenbase).

Having purchased commercial rights from LucidEra it’s ours to run with now, alongside Eigenbase and SQLstream.

LucidDB has been described as the “best database no one ever told you about.” That stops today (the telling part, not the best part). Dynamo Business Intelligence Corp will take this great technology to a wider audience and we’ll be telling EVERYONE about it!

Over time, the exceptional features of this open source project will come to light (column store, bit map idxs, drop in java based user plugins, transparent remote JDBC data access, etc). I think it is important to acknowledge how LucidDB arrived to where it is today.

LucidDB is built by smart smart people (people wayyyy smarter than me!). People who’ve written parallel execution engines in Oracle. People who’ve developed Bitmap IDX implementations and helped file those patents. The heritage of LucidDB starts at Broadbase; LucidEra purchased it and brought it to Eigenbase. Eigenbase, and it’s sponsoring companies, have most claim to its current state. Their stewardship and ongoing evolution of the project is a testament to their talents and commitment to open source development. When you pick up LucidDB/DynamoDB and get your first “Ahhhh Cool! 10x Faster than my current database” you have LucidEra/SQLstream/Eigenbase devs to thank. John V. Sichi (lead and main project sponsor), Tai Tran, Julian Hyde, Rushan Chen, Zelaine Fong, Sunny Choi, Steve, Marc, Richard, Hunter, Edan, Damian, Boris, Benny, Stephan, Oscar, …. and the list goes on and on and on. Some of these people will be helping (in small and big ways) with the new company which is great for customers knowing that the people that wrote this stuff will be helping them be successful!

What’s the plan?

  • Open Source.
    Lots of it. Any readers of this blog, or who know me in general, will know I’m a “burn the boats,” open source kind of guy. We’ll be creating some new projects to make using the features/functions already in LucidDB easier. We’ll also be adding new features, which will make their way back into the LucidDB mainline.
  • Commercial in Name Only.
    Mainline DBMS enhancements and development continue, and will continue to be, in LucidDB (Eigenbase). New projects will be available under an OSI approved license. DynamoDB is the prepackaged, assembled, UI included distribution built for customers/evaluators that we’ll offer support on. Should be as easy as we can possibly make it to evaluate, purchase, and use.
  • In Progress.
    We’ve let the announcement ahead of having our website built, or having completed our own DynamoDB QA’ed build. Our open source roots guide us to an “early and often” approach and we’re taking that approach here. Be patient with us as we roll out the business bit by bit over the next few months. Our #1 priority: establish our support/build/qa infrastructure and get an already great piece of software into hands of people who can benefit from it.Hint: If you’ve ever done a star schema on MySQL you need to talk to us!

One thing in personally looking forward to is getting to work even more extensively with everyone involved at Eigenbase, including the very talented devs at SQLstream (who are building the best damn real time analytics/integration engine available).

Feel free to join up in taking LucidDB to a whole new level: Download LucidDB and give it a go yourself, since we just released a new version (0.9.2) yesterday! I believe, like others have already mentioned, adding a bit of commercial support behind an already great piece of software is a winning combination!

Drop a line on through to me if you’re interested in getting involved early on (as a charter customer, developer, user, etc). ngoodman at bayontechnologies (with the .COM).


PlanetMySQL Voting: Vote UP / Vote DOWN