Archive for the ‘architecture’ Category

OpenDBCamp: Information Lifecycle Architecture

Май 7th, 2011
The Open DB Camp in Sardinia 2011 has had a number of sessions on varying topics. Topics range from MySQL over MongoDB to replication and High Availability.

I decided to tap into the database expert resources present here at Sardegna Ricerche by discussing a non-database issue, where one can expert database experts to have insights beyond those of end users. And they did.

The topic was the particular case of information overload many of us suffer from on our hard disks: Too many files, too hard to find.
  • How do we find the bank statement from April 2007 from the more-seldom-used account?
  • What are the ten best work-related pictures from last year?
  • Is this the most current version of the presentation of BlackRay?
  • Are these films from Cagliari already backed up? Also offsite?
It turned out that I am not the only one suffering from a slight chaos on my hard disk. We all have some basic discipline we try to follow to keep things in order, but the consensus seemed to be that disorder on the hard disk is a psychological problem to be solved by good habits, more than a technical problem to be solved by an application. This in itself is a revolutionary insight, to come from a bunch of techies.

Before going into the individual points, let me first share how I had framed the discussion:
Many OpenSQLCamp attendees spend lots of time communicating about our SQL projects, internally and externally. We spend lots of time architecting database systems, and managing the lifecycle of products.

We do little to implement a proper architecture for the non-database information we create and manage, in business and privately. We drown in emails, digital pictures, versions of downloaded PDF documents, video snippets, and attachments sent by colleagues, partners and private friends. Chaos ensues.

Disorder and low productivity are inevitable unless we are very disciplined in following some basic rules for keeping order on our hard disks, pods and pads
. But what are those basic rules? And what tools can implement them?

I don't sit in with more than a rough first sketch of "an Information Lifecycle Architecture", but I'd like to share ideas, thoughts and attitudes with my fellow OpenSQLCamp attendees. I'll present some slides and guidelines, and will make an attempt at collecting your thoughts into a summary afterwards!
I threw in a couple of basic ideas on how to handle the type of information that we have to manage as individuals, usually on our own hard disks:
  1. Separate /pub from /rep: Store raw information in its original form in one directory tree, the "repository". Store distilled information ready to be consumed in a separate directory tree, the "publications".
  2. Limit the allowed /pub formats: Allow very few formats for publishing (such as .jpg .mov .pdf .mp3 .ogg but not .doc .ppt .xls .cr2 .psd .oo3 or anything even more "exotic").
  3. Delete systematically: Don't save many versions of the same file. Don't save information that isn't needed.
  4. Sync easily: Set up the directories (and configure your software) so that it's very easy to sync the published files with your mobile devices (Androids, iPhones, iPads, iPods, digi frames), regardless if PDFs, JPGs, MOVs or MP3s.
  5. Order files by type: Above /pub and /rep, separate files by rough category: Pictures, Movies, Documents, Music.
  6. Order files by year: Under /pub and /rep, separate most files into directories by year. Month or quarter would be too frequent for most personal information.
  7. Order files by common sense: Under the year (or in exceptional cases directly under /pub or /rep), separate files by placing them into a smart directory structure, which you yourself decide about according to the topic, as opposed to delegating the file structure to the random preferences of some software (like iPhoto).
Beat Vontobel, Liz van Dijk, Markus Popp, Sheeri Kritzer Cabral, Sergei Golubchik, René Cannao and others came with very good ideas and anecdotes. Let me here relate some of them, while they're in fresh memory:
  1. Blog your notes! Write your personal notes so that they're reusable for others. Publish them on your blog. Then you can use Google to find your own notes. I think this tip is smarter than what it sounds at first, i.e. it's applicable for quite a few situations.
  2. Use version control! For some who are familiar with version control anyway, it may make sense to put presentations and various types of other personal information into a version control system.
  3. Use the cloud! Put some of the information onto the cloud, for easy availability across machines, for easy synching, for backup.
  4. Tags for fields should be part of the operating system. You could tag expense reports, notes, contacts, pictures, films, documents and emails alike with #opendbcamp. The tagging should ideally work across operating systems.
  5. Order needs discipline. Any good habit of keeping order on the hard disk needs to be backed up by a commitment in time. If you slip once, and twice, and one more time, the discipline is lacking.
  6. Storage is cheap. Or is it? Here I noted two schools of thought. One would rather just tag anything and keep order by sorting. The other school would rather delete as much as possible, so that the remainder is smaller and hence easier to keep ordered. I belong to the latter one.
  7. Bad banks throw important yet unstructured information at you. You can get a bank account statement with a long filename which doesn't denote the year and month or bank account. You yourself have to parse the file, and name it properly. That's a burden even for a geeky OpenDBCamp visitor. Think of the poor average bank customers!
  8. The analog world forced you to have a physical relationship to your data. In order to use your CDs or spices or books, your mental maps of organising them were backed up by some physical structure. This physical structure is missing from digital data. It becomes easier to forget that you even have the information. We end up with a lot of pictures, music and videos we never use.
  9. Use Yojimbo http://www.barebones.com/products/yojimbo/ as an information organiser, if you're a Mac user.
  10. Does technology solve issues or create them? Earlier, we didn't have as many pics, films, CDs or books. Now, we have more of them, in a variety of forms. Does it really make sense to spend tens of hours sorting and otherwise maintaining your collections (of films, music, pictures)? Or is it better to have smaller collections, even of the seemingly "free" items such as digital pictures and films taken by yourself?
On that philosophic observation, let me end my personal notes from the "Information Lifecycle Architecture" session at the Open DB Camp, which I have now published and will be able to find later on by Googling it.

PlanetMySQL Voting: Vote UP / Vote DOWN

Outliers and coexistence are the new normal for big data

Апрель 1st, 2011

Letting data speak for itself through analysis of entire data sets is eclipsing modeling from subsets. In the past, all too often what were once disregarded as "outliers" on the far edges of a data model turned out to be the telltale signs of a micro-trend that became a major event. To enable this advanced analytics and integrate in real-time with operational processes, companies and public sector organizations are evolving their enterprise architectures to incorporate new tools and approaches.

Whether you prefer "big," "very large," "extremely large," "extreme," "total," or another adjective for the "X" in the "X Data" umbrella term, what's important is accelerated growth in three dimensions: volume, complexity and speed.

Big data is not without its limitations. Many organizations need to revisit business processes, solve data silo challenges, and invest in visualization and collaboration tools to make big data understandable and actionable across an extended organization.

"Sampling is dead"

When complete huge data volumes can be processed and analyzed at scale, "sampling is dead," says Abhishek Mehta, former Bank of America (BofA) managing director and Tresata co-founder, and speaker at last year's Hadoop World. Potential applications include risk default analysis of every loan in a bank's portfolio and analysis of granular data for targeted advertising.

The BofA corporate investments group adopted a SAS high performance risk management solution together with IBM BladeCenter grid and XIV storage to power credit-risk modeling, scoring and loss forecasting. As explained in a recent call with the SAS high-performance computing team, this new enterprise risk management system reduced calculation times at BofA for forecasting the probability of loan defaults from 96 hours to four hours. In addition to speeding up loan processing and hedging decisions, Bank of America can aggregate bottom-up data from individual loans for perhaps a more accurate picture of total risk than what was possible previously by testing models on just subsets of data.

nPario holds an exclusive license from Yahoo for technology based on columnar storage that within Yahoo's internal infrastructure handles over eight petabytes of data for advertising and promotion, per a February 2011 discussion with nPario President and CEO Bassel Y. Ojjeh. nPario has basically forked the code, so that Yahoo can continue their internal use while nPario goes to market with a commercial offering for external customers. The nPario technology enables analysis at the granular level, not just at aggregate or sampled data. In addition to supporting a range of other data sources, nPario offers full integration with Adobe Omniture, including APIs that can pull data from Omniture (although Omniture charges a fee for this download).

Electronic Arts uses nPario for an "insight's suite" that details how gamers engage with advertising. The nPario-powered EA analytics suite tracks clicks, impressions, demographic profiles, social media buzz and other data across EA's online, console game, mobile and social channels. The result is a much more precise understanding of consumer intent and ability to micro-target ads, over what was previously possible either with sampled data or with data limited to just online or shrink-wrapped and not across the complete range of EA's customer engagement.

Multiple big data technologies coexist in many enterprise architectures

CoexistenceIn many cases, organizations will use a mix-and-match combination of relational database management systems (RDBMS), Hadoop/MapReduce, R, columnar databases such as HP Vertica or ParAccel, or document-oriented databases. Also, there is growing adoption this year beyond just the financial services industry and government for complex event processing (CEP) and related real-time or near-real-time technologies to take action from web, IT, sensor and other streaming data.

At the same time that outliers are the new normal in data science, coexistence is quickly becoming the new normal for big data infrastructure and service architectures. For many enterprises and public sector organizations, the focus is "the right tool for the job" to manage structured, unstructured and semi-relational data from disparate sources. A few examples:

The Strata Online Conference, being held April 6, will look at how information — and the ability to put it to work — will shape tomorrow's markets. Scheduled speakers include: Gavin Starks from AMEE, Jeff Jonas from IBM, Chris Thorpe from Artfinder, and Ian White from Urban Mapping.

Registration is open
  • AOL Advertising integrated two data management systems: one optimized for high-throughput data analysis (the "analytics" system), the other for low-latency random access (the "transactional" system). After evaluating alternatives, AOL Advertising combined Cloudera Distribution for Apache Hadoop (CDH) with Membase (now Couchbase). This pairs Hadoop's capability for handling large, complex data volumes with Membase's capability for speed for sub-millisecond latency in making optimized decisions for real-time ad placement.
  • At LinkedIn, to power large-scale data computations of more than 100 billion relationships a day and low-latency site serving, they use a combination of Hadoop to process massive batch workloads, Project Voldemort, for a NoSQL key/value storage engine, and the Azkaban open-source workflow system. Further, they developed a real-time, persistent messaging system named Kafka for log aggregation and activity processing.
  • The Walt Disney Co. Technology Shared Services Group extended its existing data warehouse architecture with a Hadoop cluster to provide an integration mashup for diverse departmental data, most of which is stored separately by Disney's many business units and subsidiaries. With a Hadoop cluster that went into production for shared service internal business units last October, this data can now be analyzed for patterns across different but connected customer activities, such as attendance at a theme park, purchases from Disney stores, and viewership of Disney's cable television programming. (Disney case study summarized from PricewaterhouseCoopers, Technology Forecast, Big Data issue, 2010).

Centralization and coexistence at eBay

Even companies whose enterprise architecture more closely aligns with the enterprise data warehouse (EDW) vision associated with Bill Inmon than the federated model popularized by Ralph Kimball are finding themselves migrating their architectures toward greater coexistence to empower business growth. eBay offers an instructive example.

"A data mart can't be cheap enough to justify its existence," says Oliver Ratzesberger, eBay's senior director of architecture and operations. eBay has migrated to coexistence architecture featuring Teradata as the core EDW, Teradata offshoot named Singularity for behavioral analysis and clickstream semi-relational data, and Hadoop for image processing and deep data mining. All three store multiple petabytes of data.

Named after Ray Kurzweil's thought-provoking book "The Singularity is Near," the Singularity system at eBay is running production for managing and analyzing semi-relational data, using the same Teradata SQL user interfaces that are already widely understood and liked by many eBay staff. eBay's Hadoop instances still require separate management tools, and to date, still come with fewer capabilities for workload management than what eBay receives with its Teradata architecture.

Using this tripartite architecture, on eBay's consumer online marketplace, there are no static pages. Every page is dynamic, and many if not yet all ads are individualized. These technical innovations at eBay are helping to empower eBay's corporate resurgence, as highlighted in the March 2011 Harvard Business Review "How eBay Developed a Culture of Experimentation" interview with eBay CEO John Donahoe.

Coexistence at Bank of America

Bank of America operates a Teradata data warehouse architecture with Hadoop, R and columnar extensions along with: IBM Cognos business intelligence, InfoSphere Foundation Tools and InfoSphere DataStage; Tableau reporting; SAP global ERP reporting system; and Cisco telepresence for internal collaboration; among other technologies and systems.

R-specialist Revolution Analytics cites a Bank of America reference. In it, Mike King, a quantitative analyst at Bank of America, describes how he uses R to write programs for capital adequacy modeling, decision systems design and predictive analytics:

R allows you to take otherwise overwhelmingly complex data and view it in such a way that, all of a sudden, the choice becomes more intuitive because you can picture what it looks like. Once you have that visual image of the data in your mind, it's easier to pick the most appropriate quantitative techniques.

While Revolution Analytics is sponsoring a SAS to R Challenge for SAS customers to consider converting to R, coexistence between enterprise-grade software such as SAS and emerging tools such as R, is a more common outcome than a replacement or cutback in the number of current or future SAS licenses, as shown by Bank of America's recent investment described above in the SAS risk management offering.

For its part, SAS indicates that SAS/IML Studio (formerly known as SAS Stat Studio) provides one existing capability to interface with the R language. According to Radhika Kulkarni, vice president of advanced analytics at SAS, in a discussion about SAS-R integration on the SAS website: "We are busy working on an R interface that can be surfaced in the SAS server or via other SAS clients. In the future, users will be able to interface with R through the IML procedure."

To quote Bob Rodriguez, senior director of statistical development at SAS, from that website discussion: "R is a leading language for developing new statistical methods. Our new PhD developers learned R in their graduate programs and are quite versed in it." The SAS article added that: "Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers."

Recent evolutions in big data vendors

As 10gen CEO and co-founder Dwight Merriman and new President Max Schireson described in a call March 8: "There have been periodic rebellions against the RDBMS." Intuit's small business division uses document-oriented MongoDB from 10gen for real-time tracking of website user engagement and user activities. Document-oriented CouchDB supporter CouchOne merged with key value store and memcached specialist Membase to form Couchbase; their customers include AOL and social gaming leader Zynga.

Customers had asked DataStax (previously named Riptano) for a roadmap for integrated Cassandra and Hadoop management, per an O'Reilly Strata conference discussion with DataStax CEO and co-founder Matt Pfeil and products VP Ben Werther. In March 2011, DataStax announced the Brisk integrated Hadoop, Hive and Cassandra platform, to support high-volume, high-velocity websites and complex event processing, among other applications that require real-time or near-real-time processing. According to DataStax VP of Products Ben Werther in a March 29 email: "Cassandra is at the core of Brisk and eliminates the need for HBase because it natively provides low-latency access and everything you'd get in HBase without the complexity."

Originating at Facebook and with commercial backing from DataStax, Cassandra is in use at Cisco, Facebook, Ooyala, Rackspace/Cloudkick, SimpleGeo, Twitter and other organizations that have large, active data sets. It's basically a BigTable data model running on an Amazon Dynamo like infrastructure. DataStax's largest Cassandra production cluster has more than 700 nodes. Cloudkick, acquired by Rackspace, offers a good discussion of their selection process that led to use of Cassandra: 4 months with Cassandra, a love story.

While EMC/Greenplum and Teradata/Aster Data started with PostgreSQL and moved forward from there, EnterpriseDB has continued to incorporate PostgreSQL updates. EnterpriseDB CEO Ed Boyajian and VP Karen Tegan Padir explained in a call last month that while much of the PostgreSQL initial work was to build databases for sophisticated users, EnterpriseDB has done more to improve manageability and ease of use, including a 1-click installer for PostgreSQL similar to Red Hat installer for Linux. EnterpriseDB envisions becoming for PostgreSQL what Cloudera has become for Hadoop: an integrated solution provider aimed a commercial, enterprise and public-sector accounts.

MicroStrategy is one of Cloudera's key partners for visualization and collaboration, and Informatica is quickly becoming a strong partner for ETL. To speed up what can be slow transfers in ODBC, Cloudera is building an optimized version of Sqoop. Flume agents support CEP applications, but it's not a big use case yet for Hadoop, per a call in February with Dr. Amr Awadallah, co-founder and VP of engineering, and marketing VP John Kreisa.

The following are additional examples of big data integration and coexistence efforts based on phone and in-person discussions with vendor executives in February and March 2011:

  • Adobe acquired data management platform vendor Demdex to integrate with the Omniture in the Adobe Online Marketing Suite. Demdex helps advertisers shift dollars and focus from buying content-driven placements to buying specific audiences.
  • Appistry extended its CloudIQ Storage with a Hadoop edition and partnership with Accenture for a Cloud MapReduce offering for private clouds. This joint offering runs MapReduce jobs on top of the Appistry CloudIQ Platform for behind-the-firewall corporate applications.
  • Together with its siblings Cassandra and Project Voldemort, Riak is an Amazon.com Dynamo-inspired database that Comcast, Mozilla and others use to prototype, test and deploy applications, with commercial support and services from Basho Technologies.
  • At CloudScale, CEO Bill McColl and his team offer a platform to help developers create applications designed for real-time distributed architectures.
  • Clustrix's clustered database system looks like a MySQL database "on the wire," but without MySQL code, to combine key-value stores with relational database functionality, with a focus on online transaction processing (OLTP) applications.
  • Concurrent supports an open source abstraction for MapReduce called Cascading that allows applications to integrate with Hadoop through Java API.
  • Within an enterprise and extending to its SaaS or social media data, Coveo offer integrated search tools for finding information quickly. For example, a Coveo user can search Microsoft SharePoint files or pull up data from Salesforce.com all from within her Outlook email browser.
  • Germany-based Exasol added a bulk-loader and increased integration capabilities for SAP clients.
  • Based on Big Table and other Google technologies, Fusion Tables are a service for managing large collections of tabular data in the cloud. You can upload tables of up to 100MB and share them with collaborators, or make them public. You can apply filters and aggregation to your data, visualize it on maps and other charts, merge data from multiple tables, and export it to the web or csv files.
  • Yale's Daniel Abadi and several of his colleagues unveiled Hadapt to run large and ad-hoc SQL queries with high velocity on both structured and unstructured data in Hadoop, to commercialize a project that began in the Yale computer science department.
  • IBM Netezza has partnered with R specialist Revolution Analytics add built-in R capabilities to the IBM Netezza TwinFin Data Warehouse Appliance. While Revolution Analytics has challenged SAS, they see more of a partner model with IBM Netezza and IBM SPSS. This may in part reflect the work career of Revolution Analytics President and CEO Norman Nie; prior to his current role, he co-invented SPSS.
  • Mapr targets speeding up Hadoop/MapReduce through a proprietary replacement for HDFS that can integrate with the rest Apache Hadoop ecosystem. (For a backgrounder on that ecosystem, refer to Meet the Big Data Equivalent of the LAMP Stack).
  • MarkLogic offers a purpose-built database using an XML data model for unstructured information for Simon & Schuster, Pearson Education, Boeing, the U.S. Federal Aviation Administration and other customers.
  • Microsoft Dryad offers a programming model to write parallel and distributed programs to scale from a small cluster to a large data center.
  • Pentaho offers an open source BI suite integrating capabilities for ETL, reporting, OLAP analysis, dashboards and data mining.
  • With its SpringSource and Wavemaker acquisitions, VMware is offering and expanding a suite of tools for developers to program applications that take advantage of virtualized cloud delivery environments. VMware's cloud application strategy is to empower developers to run modern applications that share information with underlying infrastructure to maximize performance, quality of service and infrastructure utilization. This extends VMware's virtualization business farther up into the software development lifecycle and provides incremental revenue for VMware while VMware positions itself for desktop virtualization to take off.

Data in the cloud

Data in cloudCloud computing and big data technologies overlap. As Judith Hurwitz at Hurwitz & Associates explained in a call on February 22: "Amazon has definitely blazed the trail as the pioneer for compute services." Amazon found they had extra capacity and started renting it out, but with little or no service level guarantees.

As Judith Hurwitz discussed, the data in the cloud market is starting to bifurcate. Private clouds are advancing the enterprise shared services model with workload management, self-provisioning and other automation of shared services. IBM, Unisys, Microsoft Azure, HP, NaviSite (Time Warner) and others are offering enterprise-grade services. While data in Amazon is pretty portable -- most services link with Amazon -- many APIs and tools are still specific to one environment, or reflect important dependencies, e.g., Microsoft Azure basically assumes a .Net infrastructure.

At the 1000 Genomes Project, medical researchers are benefiting from a cloud architecture to access data for genomics research, including the ability to download a public dataset through Amazon Web Services. For medical researchers on limited budgets, using the cloud capacity for analytics can save investment dollars. However, Amazon pricing can be deceptive as CPU hours can add up to quite a lot of money over time. To speed data transfers from the cloud, the project participants are using Aspera and its fasp protocol.

The University of Washington, Monterey Bay Aquarium Research Institute and Microsoft have collaborated on Project Trident to provide a scientific workflow workbench for oceanography. Trident, implemented with Windows Workflow Foundation, .NET, Silverlight and other Microsoft technologies, allows scientists to explore and visualize oceanographic data in real-time. They can use Trident to compose, run and catalog oceanography experiments from any web browser.

Pervasive DataCloud adds a data services layer to Amazon Web Services for integration and transformation capabilities. An enterprise with multiple CRM systems can synchronize application data from Oracle/Siebel, Salesforce.com and Force.com partner applications within a Pervasive DataCloud2 process. They can then use the feeds from that DataCloud process to power executive dashboards or business analytics. Likewise, an enterprise with Salesforce.com data can use DataCloud2 to synch with an on-premise relational database, or synch data between Salesforce.com and Intuit QuickBooks accounting software.

Big data jobs

All of this activity is welcome news for software engineers and other technical staff whose jobs may have been affected by overseas outsourcing. The monthly Hadoop user group meetups at the Yahoo campus now feature hundreds of attendees and even some job offers: many big data mega vendors and startups are hiring. For example, while Yahoo ended its own distribution of Hadoop, it has some interesting work underway with its Cloud Data Platform and Services including job openings there.

Cloudera counts 85 employees and continues to hire. Cloudera's Hadoop training courses are consistently sold out, including big demand from public sector organizations; the venture capital arm of the CIA, In-Q-Tel, became a Cloudera investor last month.

Recognizing big data's limits

To temper enthusiasm just a bit, 2011 is also a good time for a reality check to put big data into perspective. To benefit from big data, many enterprises and public sector organizations need to revisit business processes, solve data silo challenges, invest in virtualization and collaboration tools to help make big data understandable and actionable across an extended organization, and encourage more staff to develop "T-shaped" skills that combine deep technical experience (the T's vertical line) and wide business skills (the T's horizontal line).

Also, big data applications such as risk management software will not by themselves prevent the next sub-prime mortgage meltdown or the previous generation's savings and loan industry crisis. Decision-makers at financial institutions will need to make the right risk decisions, and regulatory oversight such as the new Basel rules for minimum capital requirements may play an important role too.

For more on big data technology and business trends, including a longer discussion on big data limitations, take a look at my recently published Putting Big Data to Work: Opportunities for Enterprises report on GigaOM Pro.


PlanetMySQL Voting: Vote UP / Vote DOWN

Calculating your database size

Сентябрь 25th, 2009

I generally use the following MySQL INFORMATION_SCHEMA (I_S) query to Calculate Your MySQL Database Size. This query and most others that access the MySQL INFORMATION_SCHEMA can be very slow to execute because they are not real tables and are not governed by physical data, memory buffers and indexes for example but rather internal MySQL data structures.

Mark Leith indicates in his post on innodb_stats_on_metadata that Innodb performs 8 random(ish) dives in to the index, when anybody accesses any of SHOW TABLE STATUS, SHOW INDEX, INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.STATISTICS for InnoDB tables. This can have an effect on performance, especially with a large number of Innodb tables, and a poor ratio of innodb_buffer_pool_size to disk data+index footprint.

What is even more incredible is when the result of this apparently harmless query causes the mysqld process to actual crash with a core dump due to these random index dives. The following core dump analysis highlights my query as the cause of the problem. This has happened now at least twice in for recent core crashes on a production environment.

(gdb) bt
#0 0x000000327280b6b2 in pthread_kill () from ./lib64/libpthread.so.0
#1 0x000000000055b136 in handle_segfault ()
#2 
#3 0x00000000007e1c21 in rec_get_offsets_func ()
#4 0x0000000000766007 in btr_estimate_number_of_different_key_vals ()
#5 0x000000000070d4c2 in dict_update_statistics_low ()
#6 0x000000000061fa84 in ha_innobase::info ()
#7 0x0000000000636972 in fill_schema_charsets ()
#8 0x0000000000639a66 in get_all_tables ()
#9 0x0000000000634633 in get_schema_tables_result ()
#10 0x00000000005bde37 in JOIN::exec ()
#11 0x00000000005bf7a7 in mysql_select ()
#12 0x00000000005c0127 in handle_select ()
#13 0x000000000056fcf0 in mysql_execute_command ()
#14 0x0000000000574c83 in mysql_parse ()
#15 0x00000000005751a0 in dispatch_command ()
#16 0x0000000000576483 in do_command ()
#17 0x0000000000577002 in handle_one_connection ()
#18 0x0000003272806367 in start_thread () from ./lib64/libpthread.so.0
#19 0x0000003271cd30ad in clone () from ./lib64/libc.so.6
Cannot access memory at address 0x3271cd3040 

This is an information_schema query that caused innodb to open a table.
This is totally normal. On first open, innodb tables get automatically  analyzed.
This analyze process crashed in innodb. 

This exact query *provoked* a crash: 

(gdb) x/1s 0x00002aaabc961dd0
0x2aaabc961dd0: "SELECT table_schema,table_name,engine,row_format,
table_rows, avg_row_length,
(data_length+index_length)/1024/1024 as total_mb,
(data_length)/1024/1024 as data_mb,
(index_length)/1024/1024 as index_mb,
CURDATE() AS today
FROM information_schema.tables
WHERE table_schema=@schema
ORDER BY 7 DESC"

The issue however is which table is the problem? How widespread is the corruption. Would an ALTER TABLE ENGINE=Innodb rebuild the table and eliminate the problem. Would an ANALYZE on an Innodb table identify the problem? (I doubt this second point). The problem however is even more significant due to the actual system. The largest single table of this 1TB database is 500GB. The impact of performing the ALTER, the time to undertake this blocking operation, the increase in the Innodb data file that can’t be reclaimed are just two factors that the inexperienced may fall victim of.

A saying I use is “Disaster is inevitable”. In this situation the disaster appears to not be significant but the ramifications due to the lack of appropriate and expert architectural design considerations to correct the problem are.

Is your environment capable of supporting this maintenance requirement? If not, then is the decision maker in your organization worried enough to seek the expert advice to address pro actively or will it be too late.


PlanetMySQL Voting: Vote UP / Vote DOWN

My favorite MySQL data type – DECIMAL(31,0)

Сентябрь 19th, 2009

It may seem hard to believe, but I have seen DECIMAL(31,0) in action on a production server. Not just in one column, but in 15 columns just in the largest 4 tables of one schema. The column was being used to represent a integer primary or foreign key column.

In a representative production instance (one of a dozen plus distributed production database servers) the overall database footprint was decreased from ~10 GB to ~2 GB, a 78% saving. In total, 15 columns across just 4 tables were changed from DECIMAL(31,0) to INT UNSIGNED.

One single table > 5GB was reduced to under 1GB (a 81% saving). This being my record for any GB+ tables in my time working with the MySQL database.

Had this server for example had 4GB of RAM, and say 2.5GB allocated to the innodb_buffer_pool_size, this one change moved the system from requiring more consistent disk access (4x data to memory) to being able to store all data in memory. Tests showed a clear improvement in Innodb buffer pool reads and hit ratio.

Today’s lesson as described in my 2008 conference presentation Top 20 design tips for data architects is, choose the right integer data type for your data.


PlanetMySQL Voting: Vote UP / Vote DOWN

Kickfire Basics — The KFDB columnar storage engine

Июль 29th, 2009

This is the first post in a new series of “Kickfire Basics” blog posts by myself and others here at Kickfire.  This series will review the basics of the Kickfire appliance starting from this post describing how data is stored on disk, to future posts on topics such as loading data into the appliance and writing queries which best leverage the capabilities of the SQL chip.

The Kickfire Equation
Column store + Compression + SQL Chip = performance

The Kickfire Analytic Appliance features the new KFDB storage engine which was built from scratch to handle queries over vast amounts of data.  KFDB is a column store in contrast to most MySQL storage engines which are row stores.  What follows is a description of our column oriented storage engine and how it improves performance over typical row stores.

This post concerns itself with the first part of the equation, the KFDB column store.

Column stores provide significant IO benefits over row stores

In general row stores are optimized for the quick storage and retrieval of many columns from a table for a small number of rows. Performance may suffer when a large number of rows must be accessed, particularly if only a small subset of columns must be accessed by the query.

In contrast, column stores perform very well when querying over a large number of rows, particularly if a small number of columns must be accessed, but they may struggle if asked to return only a small number of rows.  This is because instead of having to access entire rows of data, the column store can quickly retrieve data for only the subset of columns included in the query.

The column store uses 1/20th the I/O of a row store in this example diagram.

The column store uses 1/20th the I/O of a row store in this example diagram.

As you can see from the diagram, much less I/O is necessary to count the number of rows which match the WHERE  ‘Age < 15′ filter condition.

  • In order to determine which column values match the WHERE clause, a row store much read 5 entire rows, which reads 100 bytes from the table, assuming with overhead that 20 bytes of storage are used per row.
  • The column store needs only access the ‘Age’ column in order to answer the query, which reduces the amount of I/O significantly.  Column stores such as Kickfire also support column compression, which could reduce I/O even further.

In summary, column stores perform extremely well when a subset of the available columns are selected from the table because this reduces the amount of IO necessary to retrieve the values.

Beyond this point is a more technical description of how the KFDB storage engine stores rows on disk.

Columnar storage in KFDB in depth

The KFDB storage engine stores the data for each column into a segment.  Each segment is stored as a fixed-width structure on disk.  An individual segment contains data from only one column and one segment may belong to only a single table.  More than one segment can be grouped together into a column-group so that often retrieved groups of columns can be retrieved with reduced I/O costs.

In the example diagram above, each column (Pet Name, Pet Type, Age) will be stored in a separate segment and the “row #” column represents the row id (see below) for each row.

Column values are stored as fixed width on disk

The on disk width of a column is usually referred to as the “significant width” of the column.  The significant width is determined based on the data type, compression attributes and the values stored in the column.  The Kickfire loader chooses the best compression and storage attributes automatically during the loading process, but these values can be provided manually as well.  Later, when additional data is loaded, data may be “re-organized” into a different optimum format automatically.  We call this called data restructuring a “reorg”.

Each column is an array of values indexed by ROW_ID.

Each segment may be conceptualized as an array of values.  Each row in the database is identified by a ROW_ID or RID which represents the row number in the table.  To find any one column value, Kickfire multiplies the ROW_ID* significant_width and use this as an offset into the column.  This allows Kickfire to address rows in hardware and software very quickly using virtual addresses or VAs.  Each column has a base VA which is mapped into SQL chip memory for fast access.  The appliance smartly fetches appropriate ranges of columns based on what are called VA ranges, or VARs.

Columnar storage lends itself to sequential IO

When reading in an entire column, or a VA range, the database uses sequential IO to read the values into memory.  Sequential IO is much faster than the random IO.  Once the data is in memory, it may be addressed via VA lookups at extreme speeds.