Archive for the ‘cassandra’ Category

[RELOADED] Vote for MySQL+ community awards 2011 !

Январь 5th, 2012

[UPDATE 2011/01/11] : New poll added, vote for the best GUI client tool ! (And continue to vote for other polls)
And thanks again for your involvement. It’s time to vote again… 

First of all, I wish you a happy new year.
Many things happened last year, it was really exciting to be involved in the MySQL ecosystem.
I hope this enthusiasm will be increased this year, up to you !

To start the year, I propose the MySQL+ Community Awards 2011
It will only take 5 minutes to fill out these polls.
Answer with your heart first and then with your experience with some of these tools or services.

Polls will be closed January 31, so, vote now !
For “other” answers, please,  let me a comment with details.

Don’t hesitate to submit proposal for tools or services in the comments.
And, please, share these polls !

 

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

Happy 2012 !
Cédric

This article is obviously not sponsored !
(MySQL is a trademark of Oracle Corporation and/or its affiliates)

Sources :


PlanetMySQL Voting: Vote UP / Vote DOWN

New England Database Summit

Ноябрь 15th, 2011

The New England Database Summit is an all day conference-style event where participants from the research community and industry in the New England area can come together to present ideas and discuss their research and experiences working with on data-related problems.  It is an academic conference with applications to real life, and includes any type of database.

The 5th annual NEDB will be held in Cambridge, MA MIT (in 32-123) on Friday, February 3, 2012.  Anyone who would like is welcome to present a poster (registration required), or submit a short paper for review.  We plan to accept 8--10 papers for presentation (15 minutes) at the meeting.   All posters will be accepted.

For more details, and to register and / or upload a paper, see:

http://db.csail.mit.edu/nedbday12/


PlanetMySQL Voting: Vote UP / Vote DOWN

451 CAOS Links 2011.10.18

Октябрь 18th, 2011

DOCOMO adopts, invests in Couchbase. Apache Cassandra reaches 1.0. And more.

# DOCOMO Innovations adopted Couchbase as DOCOMO Capital invested in the NoSQL database vendor.

# The Apache Software Foundation announced Apache Cassandra v1.0.

# Nuxeo announced the availability of Nuxeo Cloud.

# SGI formed a distribution relationship with Cloudera and announced a record-breaking performance benchmark.

# Rapid7 announced the launch of Metasploit Community Edition.

# VoltDB announced the general availability of VoltOne.

# Juniper Networks licensed OpenNMS to add fault and performance management capabilities to the Junos Space software platform.

# The Free Software Foundation warned against Microsoft’s “Secure Boot” system.


PlanetMySQL Voting: Vote UP / Vote DOWN

Some Videos from 2010 OpenSQL Camp Boston

Октябрь 25th, 2010

 

 

OpenSQLCamp Boston has only been over for a week, but I already have about 2/3 of the videos uploaded to YouTube.  I have updated the schedule page with all the videos and slides I knew about.  I welcome comments with more information (e.g. links to slides, or tag or description suggestions for the YouTube videos).

Here's the list of videos and slides so far (also linked at http://opensqlcamp.org/Events/Boston2010/Schedule):

Adventures in Alternative Energy "Data Monitoring" with MySQL -- architecture and design case study - Matt Yonkovit, Percona - video

Cassandra and Lucene - Jake Luciani, Riptano - video - slides(slideshare)

Common MySQL Performance Blunders - Matt Yonkovit, Percona - video

Databases of the Future (Discussion) - Josh Berkus, PostgreSQL Experts - video

Keeping MySQL slaves in sync using Maatkit tools (mk-table-checksum, mk-table-sync) - Sheeri Cabral, PalominoDB, with input from Matt Yonkovit, Percona - video - slides (PDF)

 

The MariaDB Server -- What do you want from it? (intro + discussion) - Colin Charles, Monty Program AB - video

MySQL replication and the quest for a global transaction ID - Giuseppe Maxia, MySQL - video

MySQL Tuner 2.0 - Sheeri Cabral, PalominoDB - video - slides (PDF)

 

MVCC Unmasked: Implementation and Issues in Postgres, Cassandra, MySQL and CouchDB - Bruce Momjian,EnterpriseDB; Jake Luciani, Riptano; Rob Wultsch, GoDaddy; Josh Berkus, PostgreSQL Experts - video

Serialization in distributed DBS - Josh Berkus, PostgreSQL Experts -video

 

 

SQL Meets NoSQL: Mapping relational semantics onto a true multi-master, eventually consistent database (SlackDB) - Eric Day,Rackspace - video - note that there were no slides, only a discussion, so really it's audio, but it's up on YouTube, so there is video.

Teaching Developers SQL (Discussion) - Led by Josh Berkus, PostgreSQL Experts - video

 


PlanetMySQL Voting: Vote UP / Vote DOWN

OpenSQLCamp Boston in Detail

Октябрь 14th, 2010

In short:

Register / see who's coming

Schedule (will be filled in with presentations before Saturday noon)

Session ideas (45-minute sessions)

Friday, October 15th - 6-10 pm, WorkBar Boston, 711 Atlantic Ave, Boston, in the basement.  Socializing, swag, raffles, dinner, beer and soft drinks.  Take public transit (South Station on the Red Line subway or Silver Line bus if coming from the airport) or a cab; parking can be quite expensive in that area.

Saturday, October 16th - 8:30 am - 5 pm, MIT Stata Center 1st floor, 32 Vassar Street, Cambridge.  Breakfast, lunch, tech presentations.  A short walk from the Kendall Square subway stop on the Red Line, or drive an park in any MIT lot -- even if it says parking permit only, that does not apply on the weekends.   

Sunday, October 17th - same as Saturday

-----

The longer form:

As many of you know, OpenSQLCamp Boston kicks off tomorrow night with a social event at WorkBar Boston from 6-10 pm, and will include a buffet dinner from the Pulse Cafe.  Even though WorkBar Boston is more "work" than "bar" - it is a coworking space - there will be beer as well (special thanks to IOUG for sponsoring this event in particular).  Whether or not you are drinking, I strongly recommend taking public transit or a taxi -- the location is across the street from South Station, a major bus and train hub.  South Station is on the Red Line of the subway, and there is also a Silver Line bus directly from the airport terminals. The subway and Silver Line fare is $2.00.

Make sure to get sleep because Saturday starts at 8:30 am at the MIT Stata center, 32 Vassar Street.  We start with breakfast, and then after a few introductory remarks we start making the schedule at 9:30 am.  Then there are 3 45-minute sessions, with lunch at 1 pm, a panel on indexing from 2-3 pm, and from 3-5 is open time to ask questions, work on projects that were discussed during the day, and otherwise hack during the hackathon.

OpenSQLCamp does not provide dinner, but usually at the end of the day people figure out where they want to go next, and we all go over to a bar or restaurant (or go to a few different ones depending on people's preferences and tastes). 

Sunday is the same schedule as Saturday, except there is an extra session slot because we do not need opening remarks and the planning session. 

Here's the detail of food, for those who are wondering:

Friday night catered by Pulse Cafe

Vegan and vegetarian appetizers, wraps (incl. vegan), salad, vegetarian lasagna.  Beer, soda, water, iced tea.

Saturday and Sunday breakfast catered by Panera bread

Fruit, bagels, pastries, hot egg & cheese and ham, egg & cheese sandwiches, coffee, tea.

Saturday lunch catered by Greek Corner

Hummus, grape leaves, Pastitso, Falafel, Gyros, Greek Salad (feta on the side)

Sunday lunch catered by Pita Pit

Assorted pitas including meat, vegetarian and vegan options.

I am very excited, and can't wait to see you there!


PlanetMySQL Voting: Vote UP / Vote DOWN

How Real is the Data Deluge?

Октябрь 11th, 2010

Servers

It seems obvious that given the decreasing cost of storage and computation, there's going to be a significant increase in the volume of data that organizations accumulate over the next 10 years.  But the type of data being accumulated may be different from the areas where traditional DBMSs dominated.  It's not just about transactions; it's search patterns, on-line behavior, click-thru data, events fired off by smartphones, messages over Twitter & Facebook, log data of various kinds.

If an organization can figure out a better way identify prospects, or deliver more targeted ads, or optimize pricing decisions by analyzing terrabytes of data, they'd be crazy not to. Over the long term, companies that don't develop these capabilities will be at a competitive disadvantage.

As to what the implications are from a technological perspective, that's a whole different can of worms. I'm starting to see adoption of Big Data technologies like Hadoop, HDFS, Cassandra, MongoDB, XML databases, analysis with R, Pentaho, and loads of other technologies.  And MySQL continues to play a role here as do other traditional relational databases.  Over the next few months, I'm going to dig down deeper with people using these technologies to try and discern the emerging customer patterns.

If you're in this space or using some of these technologies, let me know your thoughts. What volume of data are you dealing with?  How many nodes or servers are you using?  Are you running on a public cloud, private cloud or hybrid? What technologies did you evaluate?  What about traditional DBMSs didn't work for this scenario? 


PlanetMySQL Voting: Vote UP / Vote DOWN

LCA Miniconf Call for Papers: Data Storage: Databases, Filesystems, Cloud Storage, SQL and NoSQL

Сентябрь 29th, 2010

This miniconf aims to cover many of the current methods of data storage and retrieval and attempt to bring order to the universe. We’re aiming to cover what various systems do, what the latest developments are and what you should use for various applications.

We aim for talks from developers of and developers using the software in question.

Aiming for some combination of: PostgreSQL, Drizzle, MySQL, XFS, ext[34], Swift (open source cloud storage, part of OpenStack), memcached, TokyoCabinet, TDB/CTDB, CouchDB, MongoDB, Cassandra, HBase….. and more!

Call for Papers open NOW (Until 22nd October).


PlanetMySQL Voting: Vote UP / Vote DOWN

Do We Need a New Programming Language for Big Data?

Сентябрь 13th, 2010

Data_deluge
 

I'm the boards of two companies (Pentaho, Revolution Analytics) that are starting to see a lot of customer traction around Big Data. More and more companies in media, pharma, retail and finance are doing advanced analysis, reporting, graphing, etc with massive data sets. It made me wonder what other areas of the technology stack might evolve with the trend towards Big Data.  Obviously, there's new middleware layers like Hadoop and Map Reduce, and we're also seeing the emergence of NoSQL data management layers with Cassandra, MongoDB, MemBase and others.  But what about programming languages?  

OpenGamma CEO and resident genius Kirk Wylie wrote a post recently about why he wants a new programming language.

So why don't I have this language yet? Well, partially because programming language craftsmanship is hard. I'm pretty sure I'm not good enough to do it, which is usually my default criteria for saying something is Really Hard.

But I think as well the k3wl languages coming out are coming out of language requirements of the Top 10% crowd. They're the ones good enough to actually write the languages, and they're going to write a language that makes them happy. But then you end up with Scala, and then you end up with this monstrosity, and then you make me cry. A language in which that thing is even possible will never be a candidate as a Journeyman Programming Language.

You know who's going to do it? Someone like Gosling, who set about with the needs of the journeyman programmer in Java. But the state of the art has moved on, and Java just isn't suitable anymore.

Who I would really like to do it is Anders Hejlsberg. I am a very big fan of C#-the-Language. It's just that .Net-the-Ecosystem is so Microsoft-specific and horrific it'll never catch on in the wider world, no matter what Miguel de Icaza thinks.

This got me thinking about the challenge of the current complexity in Big Data systems.  Today, you have to be near genius level to build systems on top of Cassandra, Hadoop and the like today.  These are powerful tools, but very low-level, equivalent to programming client server applications in assembly language.  When it works it's great, but the effort is significant and it's probably beyond the scope of mainstream IT organizations.  (That's one reason that Revolution's R product has appeal, but R is a specialized statistical analysis tool, not a general purpose language.)

Could the Big Data complexity be factored out somehow with a new general purpose programming language?  No doubt. Having worked with Anders on the creation of Delphi many years back, this is right up his alley.  Or maybe we already have a good starting point with Erlang, Scala and Google's Go.  Go is particularly interesting having been designed by Rob Pike and Ken Thompson of Bell Labs / Unix fame.

What's been your experience in programming Big Data systems?  What do you think's needed?  Let me know in the comments below.

Zack Urlocker is an investor, advisor and board member to several startup software companies in SaaS and Open Source. He was previously the EVP of Products at MySQL responsible for Engineering and Marketing. He built the MySQL Enterprise subscription strategy and product line. MySQL was sold to Sun for $1 billion and is now part of Oracle Corporation. He is also a marathon runner, blues guitarist and fan of Interactive Fiction


PlanetMySQL Voting: Vote UP / Vote DOWN

Digg’s main competitor (Reddit) runs Cassandra but their VP of Engineering was fired for the decision to switch.

Сентябрь 8th, 2010

Apparently, Digg performed a big migration from MySQL to Cassandra and a big migration to their new Digg v4 architecture and now their VP of Engineering has been shown the door:

Ever since Digg launched its new site design, it’s been plagued with all kinds of trouble, not least of which is that it keeps going down. The problems with the new architecture are so bad that VP of Engineering John Quinn is now gone, we’ve confirmed with sources close to Digg.

In a Diggnation video today, CEO Kevin Rose explained some of the technical issues the site is dealing with and why it can’t simply roll back to the previous architecture. The new version of Digg, v4, is based on a distributed database called Cassandra, which replaced the MySQL database the site ran on before. Cassandra is very advanced—it is supposed to be faster and scale better—but perhaps it is still too experimental. Or maybe it’s just the way Digg implemented it (Twitter uses Cassandra, although not for its main data store, as does Facebook in places, but it obviously is not as battle-tested as it needs to be). Every engineer at Digg is currently just trying to keep the site up and running.

Some of this is political. Perhaps Mr. Quinn was excused for other reasons above and beyond this switch.

Perhaps he should have had buy in from other members of the team. Had Rose personally signed off on this migration it would have been tough to fire their VP of Engineering.

The technical aspects on this type of migration are VERY difficult. Not just because you’re moving from one DB to another but a lot of the polish, fit, and finish of your existing system tend to be taken for granted over time.

Newer databases don’t have this type of polish and you end up having to duplicate a lot of infrastructure that’s already present on the previous generation.

MySQL is definitely no panacea. You’re going to have pain either way. At least with some of the modern DBs you’re partially headed in the right direction.

One trend I’ve seen is for people to use the LAMP stack to serve websites but then to use Hadoop + Hive as part of their ETL setup so they can run reports and transform production data.

There is no solid bigtable implementation just yet. I wish there was but it doesn’t seem like we have one just yet.

Cassandra isn’t that bad of course. Reddit, Digg’s main competitor – is running Cassandra.

Seems like a strange thing to fire someone over. If you’re main competitor is running the same database the decision to switch certainly couldn’t have been too bad.



PlanetMySQL Voting: Vote UP / Vote DOWN

Cassandra and Ganglia

Сентябрь 4th, 2010
cassandra_tpstats_row_read_stage_completed

I finally got some time to do some house cleaning. One of my nagging low-hanging fruit stuff was stop running jconsole on one screen to see the state of all my cassandra boxes. I created a ganglia script to graph what is above. Above I am showing all the cassandra servers and their total row read stages as a gauge. Meaning that basically I am graphing the delta of the change between ganglia script runs. This gives me the reads over time based on deltas between runs.


How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows

cass_{stat_class}_{key}

stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family stats exposed by cfstats. For instance below:

cassandra cfstats with ganglia

If your interested in the scripts I'll send it to you or put it up on code.google.com, its written in perl OOP perl and takes the same approach of packaging that maatkit tool kit for mySQL by Xarb and crew does (puts all the "classes" in the file as the application).

GmetricDelegate is the parent package
GmetricCassandra extends GmetricDelegate and overloads getData as well as defines what is an absolute stats vrs a gauge.

As you can see the pattern I also have
GmetricInnoDB
GmetricMySQL

and so on.

then on each server I run

/usr/bin/perl -w /home/scripts/ganglia_gmetric.pl --module=GmetricCassandra

this then talks to Ganglia through gmetric to report the stats.

PlanetMySQL Voting: Vote UP / Vote DOWN