Archive for the ‘Kickfire’ Category

Why Kickfire is a fail in MySQL Data warehouse

Август 5th, 2010
Even though Data warehouse is picking very rapidly in the last year or so, but few companies who are already made a right mark in the right time could not take the market share that easily due to number of reasons. Even though am not a marketing guy to go over, but some of the [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

Two new open source data warehousing launches

Октябрь 28th, 2009

In our recent report on the data warehousing market we speculated that there would soon be a change in the number of vendors operating in what is a crowded market. We were anticipating that the number of vendors would go down, rather than up, but - in the short term at least - we have been proved wrong, as two new open source analytical databases emerged this week.

First came the formation of Dynamo Business Intelligence Corp, (aka Dynamo BI), a new commercially supported distribution, and sponsor, of LucidDB. Then came the launch of InfiniDB Community Edition, a new open source analytic database based on MySQL from Calpont.

Read the rest of this post on our Too Much Information blog.


PlanetMySQL Voting: Vote UP / Vote DOWN

EU Should Protect MySQL-based Special Purpose Database Vendors

Сентябрь 12th, 2009
In my recent post on the EU antitrust regulators' probe into the Oracle Sun merger I did not mention an important class of stakeholders: the MySQL-based special purpose database startups. By these I mean:

I think it's safe to say the first three are comparable in the sense that they are all analytical databases: they are designed for data warehousing and business intelligence applications. ScaleDB might be a good fit for those applications, but I think it's architecture is sufficiently different from the first three to not call it an analytical database.

For Kickfire and Infobright, the selling point is that they are offering a relatively cheap solution to build large data warehouses and responsive business intelligence applications. (I can't really find enough information on Calpoint pricing, although they do mention low total cost of ownership.) An extra selling point is that they are MySQL compatible, which may make some difference for some customers. But that compatibility is in my opinion not as important as the availability of a serious data warehousing solution at a really sharp price.

Now, in my previous post, I mentioned that the MySQL and Oracle RDBMS products are very different, and I do not perceive them as competing. Instead of trying to kill the plain MySQL database server product, Oracle should take advantage of a huge opportunity to help shape the web by being a good steward, leading ongoing MySQL development, and in addition, enable their current Oracle Enterprise customers to build cheap LAMP-based websites (with the possibility of adding value by offering Oracle to MySQL data integration).

For these analytical database solutions, things may be different though.

I think these MySQL based analytical databases really are competitive to Oracle's Exadata analytical appliance. Oracle could form a serious threat to these MySQL-based analytical database vendors. After the merger, Oracle would certainly be in a position to hamper these vendors by resticting the non-GPL licensed usage of MySQL.
In a recent ad, Oracle vouched to increase investments in developing Sun's hardware and operating system technology. And this would eventually put them in an even better position to create appliances like Exadata, allowing them to ditch an external hardware partner like HP (which is their Exadata hardware partner).

So, all in all, in my opinion the EU should definitely take a serious look at the dynamics of the analytical database market and decide how much impact the Oracle / Sun merger could have on this particular class of MySQL OEM customers. The rise of these relatvely cheap MySQL-based analytical databases is a very interesting development for the business intelligence and data warehousing space in general, and means a big win for customers that need affordable datawarhousing / business intelligence. It would be a shame if it would be curtailed by Oracle. After the merger, Oracle sure would have the means and the motive, so if someone needs protection, I think it would be these MySQL-based vendors of analytical databases.

As always, these are just my musing and opinions - speculation is free. Feel free to correct me, add applause or point out my ignorance :)

PlanetMySQL Voting: Vote UP / Vote DOWN

The EC is mostly, but not entirely, wrong about Oracle/MySQL

Сентябрь 4th, 2009

By now you are probably aware that the European Commission has decided to launch an extended investigation into Oracle’s acquisition of Sun based on concerns over MySQL.

The new has prompted a lot of criticism of the EC, much of it suggesting that the delay will do considerable harm to Sun (and therefore Oracle). This argument is valid - Sun’s already declining revenue has been in freefall since the deal was announced and one wonders how far it will fall in another 90 days of stasis.

Other criticism, (such as this from Matt Asay) focuses on the suggestion that the delay will do little to help MySQL or its users, and that the EC fails to understand open source.

This also has some validity. The EC talks about “Oracle’s incentive to further develop MySQL as an open source database” but as Matt points out “even Oracle can’t put the open-source genie back in the bottle once it has been released, as MySQL has, under the GNU General Public License.”

This is true. although I would argue, that Oracle’s potential control over MySQL is not about licensing, but copyright. The FT states that Oracle “doesn’t control the IP, since the software is available under the GPL”. That is not entirely true. The existing code will always be under the GPL but as the copyright for that code would be fully-owned by Oracle it is under no obligation to release future developments under the GPL.

I do not expect that to happen, but copyright ownership does not just impact the ability to license code, it also provides control over potential commercial uses of that code. This is where it could be argued that the EC could be right to have anti-competitive concerns over Oracle’s future ownership of MySQL (even if it doesn’t understand why, or hasn’t articulated that it does).

Criticism of the EC has also suggested that it is disproportionately focusing on a products with a tiny market share. There are various suggestions as to quite how small MySQL’s market share is, with the WSJ citing 0.2%, but also 1.5%, AHN 0.04%, the FT “around half a percentage point”.

What all these reports overlook is that MySQL’s influence is much greater than its market share, not only in terms of more widespread unpaid usage, but also in terms of the ecosystem of vendors that are building products based on MySQL to tap into its widespread adoption.

Examples include Kickfire, Infobright and Calpont in data warehousing, ScaleDB in shared-disk clustering, Tokutek in Web-application querying, and Schooner Information Technology and Virident Systems in caching appliances.

All of these products enable mySQL to better compete with Oracle’s database products, and many of these have commercial relationships with Sun that enable them to use MySQL in proprietary products (while Infobright is itself open source, it also has a relationship with Sun).

Calpont also plans to offer an open source data warehouse based on MySQL but has put is plans on hold while it waits to see what Oracle will do with the MySQL database. Calpont’s concern is that Oracle will choose not to promote commercial relationships that use MySQL to compete more directly with Oracle’s Database business.

The MariaDB fork provides a potential alternative for these vendors, but as we previously discussed on this blog there are questions as to whether closed-source MySQL storage engines are compatible with MariaDB.

As noted in that post, ScaleDB’s Mike Hogan has argued that it can be done via an open source intermediary layer (and given that ScaleDB does not have a commercial arrangement with Sun, the company will be hoping that its analysis is correct), but MariaDB and MySQL creator Monty Widenius is not convinced: “This can only be done by buying MySQL licenses from Sun for each copy of MariaDB that is distributed.”

If Monty is correct then Oracle’s impending ownership of MySQL could theoretically have a significant impact on the emerging market for commercial products based on MySQL and their ability to compete with the Oracle Database.

As we noted in a report on the wider implications of Oracle’s impending ownership of MySQL (451 subscribers only) “For the commercial arrangements between these vendors and Oracle to survive, they will have to show that they can provide value to MySQL without impacting Oracle.”

Is that anti-competitive? Perhaps. I would argue that it certainly warrants further investigation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Open source’s role in lowering the barriers to data warehousing

Август 6th, 2009

As well as contributing to the CAOS research practice here at The 451 Group I am also part of the information management team, with a focus on databases, data caching, CEP, and - from the start of this year - data warehousing.

I’ve covered data warehousing before but taking a fresh look at this space in recent months it’s been fascinating to see the variety of technologies and strategies that vendors are applying to the data warehousing problem. It’s also been interesting to compare the role that open source has played in the data warehousing market, compared to the database market.

I’m preparing a major report on the data warehousing sector, for publication in the next couple of months. What follows is a rough outline of the role open source has played in the sector. Any comments or corrections much appreciated:

Unlike other sectors, where the role of open source has mostly been the disruption of incumbent proprietary vendors by commercial open source specialists, the impact of open source in the data warehousing sector has been more subtle, and arguably more pervasive.

Vendors such as Netezza and Greenplum have used the PostgreSQL database to build their data warehousing products, benefiting from the robust, mature PostgreSQL code base and reduced time to market. However, the end products of these development efforts are not open source.

For example, Netezza built its Netezza Performance Server (NPS) data warehouse appliances around Red Hat Linux and PostgreSQL, although the BSD license used by the PostgreSQL project enabled the company to do so without its resulting database having to be made available under an open source license. Additionally, Aster Data makes use of PostgreSQL as a data store on each node of its nCluster massively parallel data warehouse.

Similarly Greenplum also used PostgreSQL as the basis for its massively-parallel Greenplum Database and also set up and supported the Bizgres distribution with business intelligence and data warehousing specific contributions made available under the BSD license. However that project fizzled out and the website is now closed, although Greenplum’s use of PostgreSQL continues.

Another example of PostgreSQL usage comes from Paraccel, which used the PostgreSQL optimizer code in version 1.0 of its Analytic Database in order to improve time to market. That is now being replaced by a new optimizer called Omne, which is specifically designed to support the MPP columnar architecture of Paraccel and its compression capabilities, unlike the SMP PostgreSQL optimizer, which was extended to support MPP. While Omne retains some elements of the open source PostgreSQL optimizer code base, Paraccel claims it will remove all PostgreSQL code from its products with an update to the Omne technology in 2010.

Additionally Vertica, which was founded by Mike Stonebraker, creator of PostgreSQL and Ingres, is a commercial implementation of the C-Store academic research project, which was also licensed under BSD.

It is also worth mentioning that prior to its acquisition by Microsoft, DATAllegro made use of a commercial license of the open source Ingres database within its data warehousing appliances. DATAllegro actually did most of the early development work for its first appliance using PostgreSQL, but decided to change to Ingres late in 2004 to make use of partitioning capabilities, backup utilities and optimizer features. Needless to day Ingres is being replaced by Microsoft SQL Server in Microsoft’s forthcoming Madison data warehouse appliances.

LucidDB is another, often overlooked open source database, and was purpose-built for data warehousing. Based on technology developed by Broadbase Software, the code was picked up by erstwhile business intelligence SaaS provider LucidEra and combined with the Eigenbase data management framework to create LucidDB. Following LucidEra’s recent demise the LucidDB code is not currently commercially supported, although the non-profit Eigenbase Foundation is continuing to sponsor its development.

Despite this rampant use of open source code, it was not until Infobright launched Infobright Community Edition (ICE) in 2008 that we saw the first commercial open source vendor delivering its core warehouse software under an open source license. The Infobright columnar database acts as a storage engine for the MySQL database turning it into a realistic option for data warehouses of more than 200GB according to Infobright (Sun maintains that MySQL can perform as a stand-alone data-warehousing platform up to 2TB with the default MyISAM non-transactional storage engine).

While MySQL is not well known as a platform for data warehousing, Sun’s internal surveys indicate that data warehousing is the fifth-most-common use case for MySQL, which explains why it is not just Infobright that is looking to build a data warehousing business around MySQL.

Kickfire emerged in April 2008 with a beta version of its MySQL Appliance, which is built around the MySQL database and its SQL chip, which provides native instruction execution while operating directly out of memory on compressed data. Kickfire is targeting deployments in the 100GB-3TB range, while Infobright acts as a MySQL storage engine to enable use with up to 30TB of data. Infobright is developing a shared-everything, peer-to-peer architecture that will support up to 100 concurrent users and 100TB of data. Delivery is scheduled for the fourth quarter.

It remains to be seen whether Oracle will retain its commercial relationships with Kickfire and Infobright once its acquisition of Sun, and therefore MySQL, closes, but one company that has already been impacted by the acquisition its Calpont, which had planned to make a big splash at the recent MySQL Conference & Expo with the launch of its new strategy to provide a data-warehousing storage engine for the MySQL database.

The plan, to offer an open source column-oriented storage engine that will provide the MySQL database with the capabilities to function as a data warehouse, scaling from capacities of 100GB to 100TB, remains in place, although the storage engine will be in beta testing for the foreseeable future while Calpont waits to see what Oracle will do.

The most recent open source entrant into the data warehousing market is Ingres, which has teamed up with VectorWise, a database-engine spin-off from Amsterdam’s Centrum Wiskunde & Informatica (CWI) scientific research establishment, to collaborate on a new database-kernel project designed to better enable it to be positioned as a platform for data-warehouse and analytic workloads. he resulting software will be fully open source although Ingres does not have detailed plans for the productization of the technology at this stage.

While open source is playing an increasing role in the data warehousing market, PostgreSQL has primarily taken the role of lowering barriers to entry for new vendors by providing a platform for the development of data warehouse-specific capabilities on a proven database platform.

MySQL serves a similar role for Infobright, Kickfire and Calpont, but could also play a significant role in lowering barriers to entry for new data warehousing customers with small volumes of data.

Calpont turned its attention to MySQL and the midrange market in order to exploit the requirement for scalable data-warehousing capabilities from MySQL’s estimated 11 million users, as well as the fact that the low-end of the market has not been well-supported by the existing data-warehousing vendors.

Sun estimates that 90% of all data warehouses have 6TB of data or less, while Kickfire estimates there are 17,000 addressable accounts that are trying to use MySQL to create data warehouses with volumes greater than 50GB.

These estimates explain why Sun et al see an opportunity for MySQL-based warehouses to grab a slice of the market based on a low cost systems targeting a large number of customers and small amounts of data – the complete inverse of the traditional focus for data warehousing requirements, which is based on high cost systems supporting large amounts of data and a relatively small number of potential customers.

Additionally, Kickfire, Infobright and Calpont are looking to replicate the strategy MySQL successful followed in the database market by targeting a market niche that is not being served by the incumbents and avoid competing head on with the likes of Teradata, IBM, Oracle and Netezza.


PlanetMySQL Voting: Vote UP / Vote DOWN

High-Performance, Affordable, Open Data Marts

Август 3rd, 2009

Departmental or subject-specific data warehouses - known as “data marts” in the industry - seem to be gaining in popularity.  Fueled partly by companies wanting to start small with focused projects in today’s economy, and partly by advances in data warehousing technology improving affordability and deployability, data marts seem to be popping-up everywhere.

In most cases, data mart projects are driven by the head of a business unit or a functional group (like Sales) needing to analyze their own slice of data in order to run their department more efficiently and effectively.  The data may come from directly from an operational system or a combination of source systems resulting in what’s called an “independent data mart”, or it may come directly from a larger, enterprise data warehouse in a hub-and-spoke or “dependent data mart” configuration.

In either case today, according to industry analysts, companies are looking for data mart products that provide compelling price-performance and plug-and-play simplicity based on open architectures.

With our Kickfire Data Mart Appliance, we believe we have done just that.  By dramatically reducing the cost of high-performance data warehousing with our SQL Chip and ultra-modern column-store database, and by packing our technology in a true appliance, we have been able to achieve the industry’s leading price-performance and very compelling time-to-value. 

Furthermore, by leveraging the defacto standard open source database MySQL, our customers are able to design, develop, and deploy their data marts quickly and flexibly with the tools of their choice.  In this way, we’re able to provide high-performance, affordable, open data marts to allow businesses to respond to a market opportunity or competitive threat quickly and effectively.

Kickfire Basics — The KFDB columnar storage engine

Июль 29th, 2009

This is the first post in a new series of “Kickfire Basics” blog posts by myself and others here at Kickfire.  This series will review the basics of the Kickfire appliance starting from this post describing how data is stored on disk, to future posts on topics such as loading data into the appliance and writing queries which best leverage the capabilities of the SQL chip.

The Kickfire Equation
Column store + Compression + SQL Chip = performance

The Kickfire Analytic Appliance features the new KFDB storage engine which was built from scratch to handle queries over vast amounts of data.  KFDB is a column store in contrast to most MySQL storage engines which are row stores.  What follows is a description of our column oriented storage engine and how it improves performance over typical row stores.

This post concerns itself with the first part of the equation, the KFDB column store.

Column stores provide significant IO benefits over row stores

In general row stores are optimized for the quick storage and retrieval of many columns from a table for a small number of rows. Performance may suffer when a large number of rows must be accessed, particularly if only a small subset of columns must be accessed by the query.

In contrast, column stores perform very well when querying over a large number of rows, particularly if a small number of columns must be accessed, but they may struggle if asked to return only a small number of rows.  This is because instead of having to access entire rows of data, the column store can quickly retrieve data for only the subset of columns included in the query.

The column store uses 1/20th the I/O of a row store in this example diagram.

The column store uses 1/20th the I/O of a row store in this example diagram.

As you can see from the diagram, much less I/O is necessary to count the number of rows which match the WHERE  ‘Age < 15′ filter condition.

  • In order to determine which column values match the WHERE clause, a row store much read 5 entire rows, which reads 100 bytes from the table, assuming with overhead that 20 bytes of storage are used per row.
  • The column store needs only access the ‘Age’ column in order to answer the query, which reduces the amount of I/O significantly.  Column stores such as Kickfire also support column compression, which could reduce I/O even further.

In summary, column stores perform extremely well when a subset of the available columns are selected from the table because this reduces the amount of IO necessary to retrieve the values.

Beyond this point is a more technical description of how the KFDB storage engine stores rows on disk.

Columnar storage in KFDB in depth

The KFDB storage engine stores the data for each column into a segment.  Each segment is stored as a fixed-width structure on disk.  An individual segment contains data from only one column and one segment may belong to only a single table.  More than one segment can be grouped together into a column-group so that often retrieved groups of columns can be retrieved with reduced I/O costs.

In the example diagram above, each column (Pet Name, Pet Type, Age) will be stored in a separate segment and the “row #” column represents the row id (see below) for each row.

Column values are stored as fixed width on disk

The on disk width of a column is usually referred to as the “significant width” of the column.  The significant width is determined based on the data type, compression attributes and the values stored in the column.  The Kickfire loader chooses the best compression and storage attributes automatically during the loading process, but these values can be provided manually as well.  Later, when additional data is loaded, data may be “re-organized” into a different optimum format automatically.  We call this called data restructuring a “reorg”.

Each column is an array of values indexed by ROW_ID.

Each segment may be conceptualized as an array of values.  Each row in the database is identified by a ROW_ID or RID which represents the row number in the table.  To find any one column value, Kickfire multiplies the ROW_ID* significant_width and use this as an offset into the column.  This allows Kickfire to address rows in hardware and software very quickly using virtual addresses or VAs.  Each column has a base VA which is mapped into SQL chip memory for fast access.  The appliance smartly fetches appropriate ranges of columns based on what are called VA ranges, or VARs.

Columnar storage lends itself to sequential IO

When reading in an entire column, or a VA range, the database uses sequential IO to read the values into memory.  Sequential IO is much faster than the random IO.  Once the data is in memory, it may be addressed via VA lookups at extreme speeds.