Archive for the ‘luciddb’ Category

Two new open source data warehousing launches

Октябрь 28th, 2009

In our recent report on the data warehousing market we speculated that there would soon be a change in the number of vendors operating in what is a crowded market. We were anticipating that the number of vendors would go down, rather than up, but - in the short term at least - we have been proved wrong, as two new open source analytical databases emerged this week.

First came the formation of Dynamo Business Intelligence Corp, (aka Dynamo BI), a new commercially supported distribution, and sponsor, of LucidDB. Then came the launch of InfiniDB Community Edition, a new open source analytic database based on MySQL from Calpont.

Read the rest of this post on our Too Much Information blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

Calpont opens up: InfiniDB Open Source Analytical Database (based on MySQL)

Октябрь 27th, 2009
Open source business intelligence and data warehousing are on the rise!

If you kept up with the MySQL Performance Blog, you might have noticed a number of posts comparing the open source analytical databases Infobright, LucidDB, and MonetDB. LucidDB got some more news last week when Nick Goodman announced that the Dynamo Business Intelligence Corporation will be offering services around LucidDB, branding it as DynamoDB.

Now, to top if off, Calpont has just released InfiniDB, a GPLv2 open source version of its analytical database offering, which is based on the MySQL server.

So, let's take a quick look at InfiniDB. I haven't yet played around with it, but the features sure look interesting:

  • Column-oriented architecture (like all other analytical database products mentioned)

  • Transparent compression

  • Vertical and horizontal partitioning: on top of being column-oriented, data is also partitioned, potentially allowing for less IO to access data.

  • MVCC and support for high concurrency. It would be interesting to see how much benefit this gives when loading data, because this is usually one of the bottle necks for column-oriented databases

  • Support for ACID/Transactions

  • High performance bulkloader

  • No specialized hardware - InfiniDB is a pure software solution that can run on commidity hardware

  • MySQL compatible


The website sums up a few more features and benefits, but I think this covers the most important ones.

Calpont also offers a closed source enterprise edition, which differs from the open source by offering support for multi-node scale-out support. By that, they do not mean regular MySQL replication scale-out. Instead, the enterprise edition features a true distributed database architecture which allows you to divide incoming requests across a layer of so-called "user modules" (MySQL front ends) and "performance modules" (the actual workhorses that partition, retrieve and cache data). In this scenario, the user modules break the queries they recieve from client applications into pieces, and send them to one or more performance modules in a parallel fashion. The performance modules then retrieve the actual data from either their cache, or from the disk, and sends those back to the user modules which re-assemble the partial and intermediate results to the final resultset which is sent back to the client. (see picture)
shared-disk-arch-simple
Given the MySQL compatibility and otherwise similar features, I think it is fair to compare the open source InfiniDB offering to the Infobright community edition. Interesting differences are that InfiniDB supports all usual DML statements (INSERT, DELETE, UPDATE), and that InfiniDB offers the same bulkloader in both the community edition as well as the enterprise edition: Infobright community edition does not support DML, and offers a bulk loader that is less performant than the one included in its enterprise edition. I have not heard of an InfoBright multi-node option, so when comparing the enterprise edition featuresets, that seems like an advantage too in Calpont's offering.

Please understand that I am not endorsing one of these products over the other: I'm just doing a checkbox feature list comparison here. What it mostly boils down to, is that users that need an affordable analytical database now have even more choice than before. In addition, it adds a bit more competition for the vendors, and I expect them all to improve as a result of that. These are interesting times for the BI and data warehousing market :)

PlanetMySQL Voting: Vote UP / Vote DOWN

Open source’s role in lowering the barriers to data warehousing

Август 6th, 2009

As well as contributing to the CAOS research practice here at The 451 Group I am also part of the information management team, with a focus on databases, data caching, CEP, and - from the start of this year - data warehousing.

I’ve covered data warehousing before but taking a fresh look at this space in recent months it’s been fascinating to see the variety of technologies and strategies that vendors are applying to the data warehousing problem. It’s also been interesting to compare the role that open source has played in the data warehousing market, compared to the database market.

I’m preparing a major report on the data warehousing sector, for publication in the next couple of months. What follows is a rough outline of the role open source has played in the sector. Any comments or corrections much appreciated:

Unlike other sectors, where the role of open source has mostly been the disruption of incumbent proprietary vendors by commercial open source specialists, the impact of open source in the data warehousing sector has been more subtle, and arguably more pervasive.

Vendors such as Netezza and Greenplum have used the PostgreSQL database to build their data warehousing products, benefiting from the robust, mature PostgreSQL code base and reduced time to market. However, the end products of these development efforts are not open source.

For example, Netezza built its Netezza Performance Server (NPS) data warehouse appliances around Red Hat Linux and PostgreSQL, although the BSD license used by the PostgreSQL project enabled the company to do so without its resulting database having to be made available under an open source license. Additionally, Aster Data makes use of PostgreSQL as a data store on each node of its nCluster massively parallel data warehouse.

Similarly Greenplum also used PostgreSQL as the basis for its massively-parallel Greenplum Database and also set up and supported the Bizgres distribution with business intelligence and data warehousing specific contributions made available under the BSD license. However that project fizzled out and the website is now closed, although Greenplum’s use of PostgreSQL continues.

Another example of PostgreSQL usage comes from Paraccel, which used the PostgreSQL optimizer code in version 1.0 of its Analytic Database in order to improve time to market. That is now being replaced by a new optimizer called Omne, which is specifically designed to support the MPP columnar architecture of Paraccel and its compression capabilities, unlike the SMP PostgreSQL optimizer, which was extended to support MPP. While Omne retains some elements of the open source PostgreSQL optimizer code base, Paraccel claims it will remove all PostgreSQL code from its products with an update to the Omne technology in 2010.

Additionally Vertica, which was founded by Mike Stonebraker, creator of PostgreSQL and Ingres, is a commercial implementation of the C-Store academic research project, which was also licensed under BSD.

It is also worth mentioning that prior to its acquisition by Microsoft, DATAllegro made use of a commercial license of the open source Ingres database within its data warehousing appliances. DATAllegro actually did most of the early development work for its first appliance using PostgreSQL, but decided to change to Ingres late in 2004 to make use of partitioning capabilities, backup utilities and optimizer features. Needless to day Ingres is being replaced by Microsoft SQL Server in Microsoft’s forthcoming Madison data warehouse appliances.

LucidDB is another, often overlooked open source database, and was purpose-built for data warehousing. Based on technology developed by Broadbase Software, the code was picked up by erstwhile business intelligence SaaS provider LucidEra and combined with the Eigenbase data management framework to create LucidDB. Following LucidEra’s recent demise the LucidDB code is not currently commercially supported, although the non-profit Eigenbase Foundation is continuing to sponsor its development.

Despite this rampant use of open source code, it was not until Infobright launched Infobright Community Edition (ICE) in 2008 that we saw the first commercial open source vendor delivering its core warehouse software under an open source license. The Infobright columnar database acts as a storage engine for the MySQL database turning it into a realistic option for data warehouses of more than 200GB according to Infobright (Sun maintains that MySQL can perform as a stand-alone data-warehousing platform up to 2TB with the default MyISAM non-transactional storage engine).

While MySQL is not well known as a platform for data warehousing, Sun’s internal surveys indicate that data warehousing is the fifth-most-common use case for MySQL, which explains why it is not just Infobright that is looking to build a data warehousing business around MySQL.

Kickfire emerged in April 2008 with a beta version of its MySQL Appliance, which is built around the MySQL database and its SQL chip, which provides native instruction execution while operating directly out of memory on compressed data. Kickfire is targeting deployments in the 100GB-3TB range, while Infobright acts as a MySQL storage engine to enable use with up to 30TB of data. Infobright is developing a shared-everything, peer-to-peer architecture that will support up to 100 concurrent users and 100TB of data. Delivery is scheduled for the fourth quarter.

It remains to be seen whether Oracle will retain its commercial relationships with Kickfire and Infobright once its acquisition of Sun, and therefore MySQL, closes, but one company that has already been impacted by the acquisition its Calpont, which had planned to make a big splash at the recent MySQL Conference & Expo with the launch of its new strategy to provide a data-warehousing storage engine for the MySQL database.

The plan, to offer an open source column-oriented storage engine that will provide the MySQL database with the capabilities to function as a data warehouse, scaling from capacities of 100GB to 100TB, remains in place, although the storage engine will be in beta testing for the foreseeable future while Calpont waits to see what Oracle will do.

The most recent open source entrant into the data warehousing market is Ingres, which has teamed up with VectorWise, a database-engine spin-off from Amsterdam’s Centrum Wiskunde & Informatica (CWI) scientific research establishment, to collaborate on a new database-kernel project designed to better enable it to be positioned as a platform for data-warehouse and analytic workloads. he resulting software will be fully open source although Ingres does not have detailed plans for the productization of the technology at this stage.

While open source is playing an increasing role in the data warehousing market, PostgreSQL has primarily taken the role of lowering barriers to entry for new vendors by providing a platform for the development of data warehouse-specific capabilities on a proven database platform.

MySQL serves a similar role for Infobright, Kickfire and Calpont, but could also play a significant role in lowering barriers to entry for new data warehousing customers with small volumes of data.

Calpont turned its attention to MySQL and the midrange market in order to exploit the requirement for scalable data-warehousing capabilities from MySQL’s estimated 11 million users, as well as the fact that the low-end of the market has not been well-supported by the existing data-warehousing vendors.

Sun estimates that 90% of all data warehouses have 6TB of data or less, while Kickfire estimates there are 17,000 addressable accounts that are trying to use MySQL to create data warehouses with volumes greater than 50GB.

These estimates explain why Sun et al see an opportunity for MySQL-based warehouses to grab a slice of the market based on a low cost systems targeting a large number of customers and small amounts of data – the complete inverse of the traditional focus for data warehousing requirements, which is based on high cost systems supporting large amounts of data and a relatively small number of potential customers.

Additionally, Kickfire, Infobright and Calpont are looking to replicate the strategy MySQL successful followed in the database market by targeting a market niche that is not being served by the incumbents and avoid competing head on with the likes of Teradata, IBM, Oracle and Netezza.


PlanetMySQL Voting: Vote UP / Vote DOWN