Archive for the ‘cloudera’ Category

Do We Need a New Programming Language for Big Data?

Сентябрь 13th, 2010

Data_deluge
 

I'm the boards of two companies (Pentaho, Revolution Analytics) that are starting to see a lot of customer traction around Big Data. More and more companies in media, pharma, retail and finance are doing advanced analysis, reporting, graphing, etc with massive data sets. It made me wonder what other areas of the technology stack might evolve with the trend towards Big Data.  Obviously, there's new middleware layers like Hadoop and Map Reduce, and we're also seeing the emergence of NoSQL data management layers with Cassandra, MongoDB, MemBase and others.  But what about programming languages?  

OpenGamma CEO and resident genius Kirk Wylie wrote a post recently about why he wants a new programming language.

So why don't I have this language yet? Well, partially because programming language craftsmanship is hard. I'm pretty sure I'm not good enough to do it, which is usually my default criteria for saying something is Really Hard.

But I think as well the k3wl languages coming out are coming out of language requirements of the Top 10% crowd. They're the ones good enough to actually write the languages, and they're going to write a language that makes them happy. But then you end up with Scala, and then you end up with this monstrosity, and then you make me cry. A language in which that thing is even possible will never be a candidate as a Journeyman Programming Language.

You know who's going to do it? Someone like Gosling, who set about with the needs of the journeyman programmer in Java. But the state of the art has moved on, and Java just isn't suitable anymore.

Who I would really like to do it is Anders Hejlsberg. I am a very big fan of C#-the-Language. It's just that .Net-the-Ecosystem is so Microsoft-specific and horrific it'll never catch on in the wider world, no matter what Miguel de Icaza thinks.

This got me thinking about the challenge of the current complexity in Big Data systems.  Today, you have to be near genius level to build systems on top of Cassandra, Hadoop and the like today.  These are powerful tools, but very low-level, equivalent to programming client server applications in assembly language.  When it works it's great, but the effort is significant and it's probably beyond the scope of mainstream IT organizations.  (That's one reason that Revolution's R product has appeal, but R is a specialized statistical analysis tool, not a general purpose language.)

Could the Big Data complexity be factored out somehow with a new general purpose programming language?  No doubt. Having worked with Anders on the creation of Delphi many years back, this is right up his alley.  Or maybe we already have a good starting point with Erlang, Scala and Google's Go.  Go is particularly interesting having been designed by Rob Pike and Ken Thompson of Bell Labs / Unix fame.

What's been your experience in programming Big Data systems?  What do you think's needed?  Let me know in the comments below.

Zack Urlocker is an investor, advisor and board member to several startup software companies in SaaS and Open Source. He was previously the EVP of Products at MySQL responsible for Engineering and Marketing. He built the MySQL Enterprise subscription strategy and product line. MySQL was sold to Sun for $1 billion and is now part of Oracle Corporation. He is also a marathon runner, blues guitarist and fan of Interactive Fiction


PlanetMySQL Voting: Vote UP / Vote DOWN

MapReduce – DBInputFormat – Serialization on readers

Июль 20th, 2010
Last week I was working on EC2 MySQL server where one of the slave is taking lot of time to catch-up; and only job that is running on that server is mapreduce job to access InnoDB tables for read-only meta data. And debugging it further, noticed that every access to database server is serialized with [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

451 CAOS Links 2010.06.29

Июнь 30th, 2010

Elephants on parade: Hadoop goes mainstream. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca
“Tracking the open source news wires, so you don’t have to.”

Elephants on parade
# Cloudera launched v3 of its Distribution for Hadoop and released v1 of Cloudera Enterprise.

# Karmasphere released new Professional and Analyst Editions of its Hadoop development and deployment studio.

# Talend announced that its Integration Suite now offers native support for Hadoop.

# Yahoo announced the beta release of Hadoop with Security and Oozie, Yahoo’s workflow engine for Hadoop.

# Datameer announced a strategic partnership with Zementis for predictive analytics on Hadoop.

# The Register reported that Twitter is set to open source its MySQL-to-Hadoop tool.

# MicroStrategy announced support for Apache Hadoop as a data source for MicroStrategy 9.

# Appistry announced Hadoop-based strategic alliances Concurrent, Datameer and Kitenga.

# GOTO Metrics released Data Analytics Platform, a Hadoop-based business intelligence platform.

Best of the rest
# The Software Freedom Law Center responded to the Supreme Court’s decision on Bilski v. Kappos, while Mark Radcliffe provided his thoughts.

# David Wiley discussed openness, radicalism, and tolerance (and the lack of it).

# Jorg Janke discussed how Compiere overstepped the balance between proprietary and open product components.

# Simon Phipps argued that open core is bad for software freedom.

# Nick Halsey joined SugarCRM as chief marketing officer.

# DotNetNuke more than doubled its subscription customers in 1H10 to nearly 800, expects 400% FY revenue growth.

# Nuxeo announced its new Nuxeo Case Management Framework.

# Mike Masnick discussed why the lack of billion dollar pure play open source software companies is a good thing.

# The Apache Software Foundation announced Apache Tomcat Version 7.0.

# Glyn Moody asked whether Oracle has been a disaster for Sun’s open source.

# Infoworld discussed eight business strategies for profiting from open source software.

# Computerworld reported that Red Hat CEO sees VMware as biggest competitor.

# IBM published an essay on the role Linux plays in its smarter planet initiative.

# Groklaw asked, What did Microsoft know about SCO’s plan to attack Linux, and when did it know it?

# Mozilla won the American Business Award for the most innovative company of the year.


PlanetMySQL Voting: Vote UP / Vote DOWN

Piper Jaffray on the Cloud

Март 16th, 2010

Piper Jaffray has published a 300+ page study on the cloud computing industry based on a recent survey undertaken of 100 CIOs. Bottom line, cloud computing is expected to grow significantly over the next five years. 

    Survey respondents expect the mix of cloud computing to escalate strongly to 13.5% in five years. This equates to a five-year CAGR of 19.2%, or 23.9% when we also incorporate IDC’s forecast that total software budgets will grow 4.7% annually. In other words, software spending will grow gradually in the next five years, but the mix of spend allocated to cloud-based applications will likely surge rapidly. Another way to think about the data is that the Cloud Computing market is expected to grow five times as fast as the broader software market: 23.9% vs. 4.7%.

If anything, I think the prediction is conservative and the impact could be much larger in magnitude when mainstream adoption occurs.  But the risk is that adoption takes longer, just as it did for open source software.  And as the report indicates, open source is powering much of the cloud computing that's going on:

    The next-generation Cloud Computing data centers are NOT running Microsoft Windows; they are increasingly leveraging the compelling economics of open source components. For example, the data centers powering Amazon, Google, and salesforce.com all run on Linux and other open source technologies. In fact, Red Hat’s operating system and the MySQL database are key components to many of the leading-edge Clouds being developed today. 

Why is this occurring? Because open source leverages a global community development process which results in a product that evolves rapidly, provides transparency into the source code dynamics, and surpasses other products in terms of security and reliability – all at a lower total cost of ownership (TCO) than traditional offerings.


PlanetMySQL Voting: Vote UP / Vote DOWN

VMware,”Hey what ya’ building over there?”

Январь 5th, 2010

Today I caught a tweet from Kara Swisher referencing some exclusive news she posted on Boomtown about VMware’s upcoming deal to buy Zimbra from Yahoo! This is would be VMware’s second acquisition of an open source ISV in under a VMware Open Source Planyear. In August 2009 VMware acquired open source java vendor SpringSource that not only developed the popular Spring framework but had also acquired open source systems management vendor Hyperic (May 2009) and commercial Apache support vendor, Covalent (January 2009).

According to CNET’s Matt Asay, Yahoo!’s  Zimbra business unit is still growing and has an impressive customer base:

Lost in the news of Zimbra’s release of version 6.0 of its collaboration suite is the importance of one very big number: 50 million. That’s how many paid mailboxes Zimbra claims now, a number that puts it within spitting distance of IBM Lotus Notes (approximately 145 million paid mailboxes) and Microsoft Exchange (approximately 175 million paid mailboxes). Whatever the truth to rumors that Zimbra is up for sale, Zimbra is an appreciating asset for Yahoo, not a depreciating one.

I also noticed a couple of months ago that VMware started to re-brand getting rid of their old blue logo and moving to a grey logo sans the “virtualization boxes”. According to this post by VMware CMO Rick Jackson:

Now, as we look at our current offerings based on vSphere, and our vision of delivering the infrastructure for unrestrained cloud computing, the image we are portraying to the market has evolved.  In fact, our message embodies the notion of freeing IT from the constraints of physical resources.

Makes you wonder in the long-term where VMware might draw the line...

Build, Manage and Provide the Silver Lining for Clouds?

Does this signal the beginning of a broader VMware open source acquisition strategy? Maybe they will complete their java application stack with a database.  Barring Larry Ellison offering to sell MySQL to VMware maybe there are some other opportunities. VMware might benefit from picking up EnterpriseDB or maybe become the patron saint for MySQL fork, MariaDB sponsored by MySQL creator Monty Widenius. Beyond the database there are a number of interesting buying opportunities out there for VMware should they have their pocketbook open. For one there is rPath which can build and update Linux virtual machines and provide automated provisioning of systems taking VMware’s management and deployment capabilities one step further.

Another option would be to get deeper in management by picking up one of the open source configuration management vendors like Reductive Labs that produces Puppet or newly funded cloud configuration rival Opscode and their open source project, Chef. They could even go old school and take a look at CFengine which is similar to Chef and Puppet but supports not only Unix-like systems but Windows too. Alternatively, they could acquire commercial open source vendor, Cloudera that provides support for Hadoop, an open source implementation of  MapReduce which is ideally suited for cloud deployment.

I guess that’s enough speculation for today. However, it will be curious to see if the deal goes through and if VMware pays a premium over Yahoo!’s acquisition price of $350 million back in 2007. It could as The VarGuy notes it could trigger a reset for how open source companies are valued.

Related Articles

Technorati Tags: , , , , , , , , , , , ,


PlanetMySQL Voting: Vote UP / Vote DOWN