Archive for the ‘programming’ Category

Making rpm builds a first class citizen: How?

Январь 20th, 2012

In my previous post I explained why I believe the production of RPM and DEB packages should be more integrated with the rest of your development process. Now it's time to look into how you can put the RPM build scripts inside your main source code repository, and in particular how I did that to produce RPM packages for Drizzle.

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Making rpm builds a first class citizen: How?

Январь 20th, 2012

In my previous post I explained why I believe the production of RPM and DEB packages should be more integrated with the rest of your development process. Now it's time to look into how you can put the RPM build scripts inside your main source code repository, and in particular how I did that to produce RPM packages for Drizzle.

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Making rpm builds a first class citizen: Why?

Январь 20th, 2012

Last weekend I released rpm files for the latest Drizzle Fremont beta (announcement). As part of that work I've also integrated the spec file and other files used by the rpmbuild into the main Drizzle bzr repository (but not yet merged into trunk). In this post I want to explain why I think this is a good thing, and in a follow up post I'll go into what I needed to do to make it work.

(And speaking of stuff you can download, phpMyAdmin 3.5.0-alpha1 now supports Drizzle!)

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Could closed core prove a more robust model than open core?

Декабрь 2nd, 2011

When participating recently in a sprint held at Google to document four free software projects, I thought about what might have prompted Google to invest in this effort. Their willingness to provide a hotel, work space, and food for some thirty participants, along with staff support all week long, demonstrates their commitment to nurturing open source.

Google is one of several companies for which I'll coin the term "closed core." The code on which they build their business and make their money is secret. (And given the enormous infrastructure it takes to provide a search service, opening the source code wouldn't do much to stimulate competition, as I point out in a posting on O'Reilly's radar blog). But they depend on a huge range of free software, ranging from Linux running on their racks to numerous programming languages and libraries that they've drawn on to develop their services.

So Google contributes a lot back to the free software community. The release code for many non-essential functions. They promote the adoption of standards such as HTML 5. They have been among the first companies to offer APIs for important functions, including their popular Google Maps. They have opened the source code to Android (although its development remains under their control), which has been the determining factor in making Android devices compete with the arguably more highly-functioning iOS products. They even created a whole new programming language (Go) and are working on another.

Google is not the only "closed core" company (for instance, Facebook has also built their service around APIs and released their Cassandra project). Microsoft has a whole open source program, including some important contributions to health IT. Scads of other companies, such as IBM, Hewlett Packard, and VMware, have complex relationships to open source software that don't fit a simple "open core" or "closed core" model. But the closed core trend represents a fertile collaboration between communities and companies that have businesses in specific areas. The closed core model requires businesses to determine where their unique value lies and to be generous in offering the public extra code that supports their infrastructure but does not drive revenue.

This model may prove more robust and lasting than open core, which attracts companies occupying minor positions in their industries. The shining example of open core is MySQL, but its complex status, including a long history of dual licensing and simultaneous development by several organizations, make it a difficult model from which to draw lessons about the whole movement. In particular, Software as a Service redefines the relationships that the free software movement has traditionally defined between open and proprietary. Deploying and monitoring the core SaaS software creates large areas for potential innovation, as we saw with Cassandra, where a company can benefit from turning their code into a community project.


PlanetMySQL Voting: Vote UP / Vote DOWN

What’s New in CFEngine 3: Making System Administration Even More Powerful

Октябрь 28th, 2011

CFEngine is both the oldest and the newest of the popular tools for automating site administration. Mark Burgess invented it as a free software project in 1993, and years later, as deployments in the field outgrew its original design he gave it a complete rethink and developed the powerful concept of promise theory to make it modular and maintainable. In this guise as version 3, CFEngine stands along with two other pieces of free software, Puppet and Chef, as key parts of enterprise computing. Along the way, Burgess also started a commercial venture, CFEngine AS, that maintains both the open source and proprietary versions of CFEngine.

Diego Zamboni has recently taken the position of Senior Security Advisor at CFEngine AS and is writing a book for O'Reilly on CFEngine 3. I talked to him this week about the recent new release of the open source version (3.2.4) in tandem with a new commercial release of CFEngine 3 Nova (version 2.1.3). Here's are excerpts of what he has written to introduce CFEngine 3.

CFEngine 3 is fine-tuned to the features and design that make it possible to automate very large numbers of systems in a scalable and manageable way. CFEngine 3 is also very lightweight--its binaries normally use less than 30MB of disk space, it requires a single TCP port to communicate among servers and clients, and it has been designed to be very resource-efficient. CFEngine 3 can run on everything from smartphones to supercomputers.

CFEngine 3 is different from many other automation mechanisms in that you do not need to tell it what to do. Instead, you specify the state in which you wish the system to be, and CFEngine 3 will automatically and iteratively decide the actions to take to reach the desired state, or as close to it as possible. Underlying this ability is a powerful theoretical model known as Promise Theory, which was initially developed for CFEngine 3, but which has also found other applications in Computer Science and in other fields such as Economics and Organization.

This allows you to develop building blocks for complex promises that remain readable and manageable because the lower-level components are encapsulated. Each promise represents the desired state of certain parts of the system. At the lowest level, these are some of the things that you can express to CFEngine 3 as desired states:

  • "Make sure file /foo/bar contains line xyz"

  • "Make sure user foobar exists/does not exist"

  • "Make sure process foo is/is not running"

At a higher level of abstraction, you can encapsulate CFEngine 3 operations and express high-level desired states:

  • "Make sure all web servers have Apache installed"

  • "Make sure all root accounts have the same, centrally-designated password"

  • "Make sure parameters EnableDNS and AllowRoot are disabled on all sshd configurations"

And at an even higher level, you can express top-level desired states like these:

  • "Configure host xyz as a database server"

  • "Create a new cluster of VMs to use as web servers"

So what's in the new versions? CFEngine 3 Nova includes:

  • System monitoring extensions, which extend the monitoring capabilities of CFEngine 3 Community (to monitor system state such as CPU load, number of processes and network connections, disk utilization, etc.) to allow for defining custom monitors for any type of information.

  • Support for manipulating virtual machines on Xen, VMware ESX, and KVM.

  • Native Windows support.

  • Flexible searching of reports in a brand new scalable interface that supports thousands of hosts on a single hub.

  • Improved machine learning and anomaly monitoring for diagnostics and capacity planning. Additional sensors have been added to detect operating system performance and behavioral trends, especially on Linux kernels.

  • The NoSQL document-oriented database MongoDB, used instead of MySQL for all storage on Nova's Mission Portal.

  • Generic JSON return values so that users can customize the interface and JQuery framework of the Mission Portal. This allows direct access to data in a way that makes higher levels of scripting more effective.

CFEngine 3 Community also includes a large number of improvements, all of which are in Nova too:

  • A vastly improved bootstrapping process, which makes it easy to get new CFEngine 3 servers and clients up and running with very little manual configuration.

  • Support for environments, which are a way of grouping hosts according to arbitrary definitions. This makes it very easy to define, for example, "development," "testing," and "production" environments for CFEngine 3 policies.

  • The new cf-report command, available in both Community and Nova, which allows extraction of data and generation of reports from the command line. It can produce reports both about the behavior of the current CFEngine 3 environment (policies, hosts, etc.) and about internal information, such as a CFEngine 3 syntax summary.

  • Many performance and concurrency improvements and bug fixes.

  • Several new functions and parsing improvements, including and(), not(), and or() functions, to ease writing of complex class expressions.

  • A new and improved Emacs mode for editing CFEngine 3 policy files.

Velocity Europe, being held Nov. 8-9 in Berlin, will bring together the web operations and performance communities for two days of critical training, best practices, and case studies.

Save 20% on registration with the code RADAR20


PlanetMySQL Voting: Vote UP / Vote DOWN

Developer Week in Review: These things always happen in threes

Октябрь 26th, 2011

Fall is being coy this year in the Northeast. We've been having on and off spells of very mild, almost summer-like weather over the last few weeks. That trend seems to be finally ending, alas, as there is possible snow forecasted for the weekend in New Hampshire. As the old joke goes, if you don't like the weather here, just wait five minutes.

The fall also brings hunting to the area. The annual moose season just concluded (you need to enter a special lottery to get a moose permit), but deer season is just about to open. My son and I won't be participating this year, but we recently purchased the appropriate tools of the trade, a shotgun to hunt in southern NH (where you can't hunt deer with a rifle) and a Mosin Nagant 91/30 for the rest of the state. The later is probably overkill, but my son saved up his pennies to buy it, being a student of both WWII and all things Soviet. Hopefully, he won't dislocate his shoulder firing it ...

Meanwhile, in the wider world ...

John McCarthy: 1927-2011

It's been a sad month for the computer industry, with the deaths of Steve Jobs and Dennis Ritchie already fact. Less well known, but equally influential, AI pioneer and LISP creator John McCarthy passed away on Sunday. McCarthy was involved in the creation of two of the preeminent AI research facilities in the world, at MIT and Stanford, and he is generally credited with coining the term "artificial intelligence."

LISP has had its periods of popularity, peaking in the 1980s, but it's never been a mainstream language in the way that C, FORTRAN, BASIC or Java was. What people tend to forget is just how old LISP really is. Only FORTRAN, COBOL and ALGOL are older then LISP, which came on the scene in 1958. Many of the concepts we take for granted today, such as closures, first saw light in LISP. It also lives in the hearts of Emacs and AutoCAD, among others, and LISP is the language used in much of the groundbreaking artificial intelligence work.

On a side note, when I first met my wife and told her I was involved in the AI field, she gave me a truly strange look. She had a BA in animal science, you see, and in that field "AI" stands for artificial insemination.

Velocity Europe, being held Nov. 8-9 in Berlin, will bring together the web operations and performance communities for two days of critical training, best practices, and case studies.

Save 20% on registration with the code RADAR20

Someone finally admits the dirty truth about the GPL

If you listen to Richard Stallman, the GPL is all about being a coercive force that will eventually drive all software to be free (as in freedom.) Those of us who watch such things have noticed that it has a paradoxical effect, however. Companies like MySQL (now Oracle) use it the same way that drug dealers offer free samples to new customers. "The first one's free, but you'll be back for more." In other words, they get you hooked by offering a GPL version, but cash in when you want to use their product for commercial purposes because the GPL is too dangerous for most companies.

Now, python developer Zed Shaw has brought the GPL's dirty little secret into the light of day. In a particularly NSFW rant, Shaw explains why he chooses to use the GPL these days. In short, it's because he's sick of developers at companies getting to be heroes by using his stuff and getting the glory. "I use the GPL to keep you honest. You now have to tell your bosses you're using my gear. And it will scare the piss out of them." He goes on to say that he's using the GPL as a stick to force companies to pay him to use his software.

This goes right to the very core of the debate about what free/open software should be about. Is it a tool to make all software free? Is it a way to allow "good" people (i.e., non-commercial users) to have access while punishing "bad" people (professional developers)? Personally, I'm thrilled that Southwest Airlines uses a Java library I created for another client years ago and open sourced, but evidently some people (especially those who aren't getting paid to maintain open-source projects by a day job) want to get paid for their efforts.

I find the logic a bit questionable. I don't see a lot of difference between a free software developer who holds corporate users' feet to the fire and a commercial software developer. Sure, it still allows hobbyists and educational users to use the software for free, but it's actually acting to discourage companies from getting involved in FL/OSS by encouraging the wrong model. When companies use open-source software in their products, they are more likely to contribute back to the project and to open source other non-critical code they produce. If they are paying a developer for it, they are much less likely to contribute back.

The Steve Jobs movie: I predict lots of people walking and talking

With the Steve Jobs biography currently sitting at the top of Amazon's bestseller list, Sony Pictures is wasting no time getting a film adaptation underway. The current buzz is that Aaron Sorkin, creator of the West Wing and winner of the Academy Award for his adaptation of "The Social Network," is on the short list to write the screenplay.

It would be interesting to see how Sorkin would tackle Jobs' story, full and complex as it is. One approach might be to leave out the '80s, already covered to some degree in "Pirates of Silicon Valley," and concentrate instead on his youth and the last 15 years of his life. One can only hope that the technological details are not hopelessly mangled in an attempt to make it accessible.

Got news?

Please send tips and leads here.

Related:


PlanetMySQL Voting: Vote UP / Vote DOWN

New algorithm for calculating 95 percentile

Август 30th, 2011

The 95 percentile for query response times is and old concept; Peter and Roland blogged about it in 2008. Since then, MySQL tools have calculated the 95 percentile by collecting all values, either exactly or approximately, and returning all_values[int(number_of_values * 0.95)] (that’s an extreme simplification). But recently I asked myself*: must we save all values? The answer is no. I created a new algorithm** for calculating the 95 percentile that is faster, more accurate, and saves only 100 values.***

Firstly, my basis of comparison is the 95 percentile algo used by mk-query-digest. That algo is fast, memory-stable, and very proven in the real world. It works well for any number of values, even hundreds of thousands of values. It saves all values by using base 1.05 buckets and counting the number of values that fall within the range of each bucket. The results are not exact, but the differences are negligible because a 10ms and 13ms response time are indiscernible to a human. Any algo that hopes to handle very large numbers of values must approximate because not even C can store and sort hundreds of thousands of floats (times N many attributes times N many query classes) quickly enough.

So when I finished the new algo, I compared it to the mk-query-digest algo and obtained the following results:

FILE                         REAL_95     OLD_95     NEW_95  OLD_DIFF NEW_DIFF  OLD_TIME NEW_TIME   FILE_SIZE  OLD_RSS  NEW_RSS
nums/500k-1-or-2               1.751      1.697      1.784    -0.054   +0.033     12.12     9.37     4500000    3.88M    2.63M
nums/100k-1-or-2               1.749      1.697      1.794    -0.052   +0.045      2.42     1.88      900000    3.88M    2.63M
nums/50k-trend-1-to-9          6.931      6.652      6.995    -0.279   +0.064      1.24     0.90      450000    3.88M    2.63M
nums/25k-trend-1-to-5          3.888      3.704      3.988    -0.184   +0.100      0.64     0.47      225000    3.88M    2.63M
nums/21k-1-spike5-1            0.997      0.992      2.002    -0.005   +1.005      0.55     0.42      189000    3.88M    2.63M
nums/10k-rand-0-to-20         19.048     18.532     19.054    -0.516   +0.006      0.29     0.21       95079    3.86M    2.62M
nums/10k-rand-0-to-10          9.511      9.360      9.525    -0.151   +0.014      0.29     0.21       90000    3.86M    2.62M
nums/4k-trend-1-to-7           5.594      5.473      6.213    -0.121   +0.619      0.14     0.09       36000    3.86M    2.63M
nums/1k-sub-sec                0.941      0.900      0.951    -0.041   +0.010      0.07     0.04        9000    3.80M    2.62M
nums/400-half-10              10.271      9.828     10.273    -0.443   +0.002      0.05     0.03        3800    3.79M    2.62M
nums/400-high-low             10.446     10.319     10.446    -0.127        0      0.05     0.03        3800    3.79M    2.62M
nums/400-low-high             10.445     10.319     10.475    -0.126   +0.030      0.05     0.03        3800    3.79M    2.63M
nums/400-quarter-10           10.254      9.828     10.254    -0.426        0      0.06     0.03        3700    3.79M    2.62M
nums/153-bias-50              88.523     88.305     88.523    -0.218        0      0.05     0.03        1500    3.79M    2.62M
nums/100-rand-0-to-100        90.491     88.305     90.491    -2.186        0      0.05     0.03         991    3.79M    2.62M
nums/105-ats                  42.000     42.000     42.000         0        0      0.05     0.03         315    3.75M    2.61M
nums/20                       19.000     18.532     19.000    -0.468        0      0.04     0.03          51    3.79M    2.62M
nums/1                        42.000     42.000     42.000         0        0      0.04     0.03           3    3.75M    2.61M

 
I generated random microsecond values in various files. The first number of the filename indicates the number of values. So the first file has 500k values. The remaining part of the filename hints at the distribution of the values. For example, “50k-trend-1-to-9″ mean 50k values that increase from about 1 second to 9 seconds. Number and distribution of values affects 95 percentile algorithms, so I wanted to simulate several possible combinations.

“REAL_95″ is the real, exact 95 percentile; this is the control by which the “old” (i.e. the mk-query-digest) and new algos are compared. The diffs are comparisons to this control.

Each algo was timed and its memory (rss) measured, too. The time and memory comparisons are a little bias because the mk-query-digest module that implements its 95 percentile algo does more than my test script for the new algo.

The results show that the new algo is about 20% faster in all cases and more accurate in all but one case (“21k-1-spike5-1″). Also, the new algo uses less memory, but again this is a little bias; the important point is that it doesn’t use more memory to get its speed or accuracy increase.

The gains of the new algo are small in these comparisons, but I suspect they’ll be much larger given that the algo is used at least twice for each query. So saving 1 second in the algo can save minutes in data processing when there’s tens of thousands of queries.

Instead of explaining the algorithm exhaustively, I have upload all my code and data so you can reproduce the results on your machine: new-95-percentile-algo.tar.gz. You’ll need to checkout Maatkit, tweak the “require” lines in the Perl files, and tweak the Bash script (cmp-algos.sh), but otherwise I think the experiment should be straight forward. The new algo is in new-algo.pl. (new-algo.py is for another blog post.)

My ulterior motive for this blog post is to get feedback. Is the algorithm sane? Is there a critical flaw that I overlooked? Do you have a real-world example that doesn’t work well? If you’re intrepid or just curious and actually study the algo and have questions, feel free to contact me.

* By “recently asked myself” I mean that some time ago Baron and I wondered if it was possible to calculate 95 percentile without saving all values. At that time, I didn’t think it was feasible, but lately I thought and coded more about the problem.

** By “a new algorithm” I doubt that this has never been attempted or coded before, but I can’t find any examples of a similar algorithm.

*** By “saves only 100 values” I mean ultimately. At certain times, 150 values may be saved, but eventually the extra 50 should be integrated back into the base 100 values.


PlanetMySQL Voting: Vote UP / Vote DOWN

Developer Week in Review: Lion drops pre-installed MySQL

Август 3rd, 2011


A busy week at Casa Turner, as the infamous Home Renovations of Doom wrap up, I finish the final chapters of "Developing Enterprise iOS Applications" (buy a copy for all your friends, it's a real page turner!), pack for two weeks of vacation with the family in California (Palm Springs in August, 120 degrees, woohoo!), and celebrate both a birthday and an anniversary.



But never fear, WIR fans, I'll continue to supply the news, even as my MacBook melts in the sun and the buzzards start to circle overhead.

The law of unintended consequences

Lion ServerIf you decide to install Lion Server, you may notice something missing from the included software: MySQL. Previous releases of OS X server offered pre-installed MySQL command line and GUI tools, but they are AWOL from Lion. Instead, the geek-loved but less widely used Postgres database is installed.

It seems pretty obvious to the casual observer why Apple would make this move. With Oracle suing Google over Java, and Oracle's open source philosophy in doubt, I know I wouldn't want to stake my bottom line on an Oracle package bundled with my premiere operating system. Apple could have used one of the non-Oracle forks of MySQL, but it appears they decided to skirt the issue entirely by going with Postgres, which has a clear history of non-litigiousness.

Meanwhile, Oracle had better be asking themselves if they can afford to play the games they've been playing without alienating their market base.

South Korea fines Apple 3 million won, which works out to ...

Apple has bee been hit with a penalty from the South Korean government that's a result of the iPhone location-tracking story that broke earlier this year. Now, Apple may have more money than the U.S. Treasury sitting in petty cash right now, but it will be difficult for them to recover from such a significant hit to their bottom line: a whopping 3 million won, which works out to a staggering ... um ... $2,830. Never mind.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 20% on registration with the code STN11RAD

Java 7 and the risks of X.0 software

Java 7 was recently released to the world with great fanfare and todo. This week, we got a reminder why using an X.0 version of software is a risky endeavor. It turns out that the optimized compiler is really a pessimized compiler, and that programs compiled with it stand a chance of crashing. Even better, there's a chance they'll just go off and do the wrong thing.

Java 7 seems to be breaking new ground in non-deterministic programming, which will be very helpful for physics researchers working with the Heisenberg uncertainty principle. What could be more appropriate for simulating the random behavior of particles than a randomly behaving compiler?

Got news?

Please send tips and leads here.

Related:


PlanetMySQL Voting: Vote UP / Vote DOWN

NoSQL is What?

Июль 24th, 2011

I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You will have plenty of time to switch to NoSQL as and if it becomes helpful.  Until that time, NoSQL is an expensive distraction you don’t need.

Uhm… WHAT?!

I’ve spent more than a few years using MySQL and have been using some NoSQL systems for the last year or so in a fairly busy environment. And scaling is only one of the considerations that factor into those decisions. Features matter too, you know. I really like MongoDB‘s built-in sharding and replica sets. They kick ass. And Redis is an awesome in-memory data store that goes beyond what something like memcached offers. And being schema-less makes a whole hell of a lot of sense in some applications–probably A LOT of applications.

NoSQL exists for a reason–because they ARE useful to a lot of people. This isn’t some stupid bubble.

And to make switching data stores sound like something that “you will have plenty of time for” is outright nuts. There’s a lot of work involved. More than you probably expect. (Ask me how I know…)

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology.  This creates a greater tax on being able to adopt the technology.  That sounds a lot like what we expect to see in premature optimizations to me.

Gee, let me get this straight. If you’re using newer technology, you’re dealing with less mature tools?

No shit. But that’s how progress works. You make a choice to use something that in inferior today because it gives you more leverage in the future. That’s the path that Clayton Christensen laid out in The Innovator’s Dilemma.

There is no particular advantage to NoSQL until you reach scales that require it.

Bullshit. Have you even tried modeling an application that felt shoe horned into MySQL in a NoSQL tool? Is “saving a lot of development time” not a particular advantage? What about time consuming schema changes?

Again, I think we need to talk about the best tool for the job, not the best tool for every job. Relational databases are not the best tool for every data storage job.

If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

Seriously? I guess that has a to do with how you value your time. The term that comes to mind here is opportunity cost.

You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier.

They are more proven, but cheaper and easier have a lot to do with your application and your real needs. This strikes me as an over-reaching generalization that doesn’t match reality.



PlanetMySQL Voting: Vote UP / Vote DOWN

NoSQL is What?

Июль 24th, 2011

I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You will have plenty of time to switch to NoSQL as and if it becomes helpful.  Until that time, NoSQL is an expensive distraction you don’t need.

Uhm… WHAT?!

I’ve spent more than a few years using MySQL and have been using some NoSQL systems for the last year or so in a fairly busy environment. And scaling is only one of the considerations that factor into those decisions. Features matter too, you know. I really like MongoDB‘s built-in sharding and replica sets. They kick ass. And Redis is an awesome in-memory data store that goes beyond what something like memcached offers. And being schema-less makes a whole hell of a lot of sense in some applications–probably A LOT of applications.

NoSQL exists for a reason–because they ARE useful to a lot of people. This isn’t some stupid bubble.

And to make switching data stores sound like something that “you will have plenty of time for” is outright nuts. There’s a lot of work involved. More than you probably expect. (Ask me how I know…)

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology.  This creates a greater tax on being able to adopt the technology.  That sounds a lot like what we expect to see in premature optimizations to me.

Gee, let me get this straight. If you’re using newer technology, you’re dealing with less mature tools?

No shit. But that’s how progress works. You make a choice to use something that in inferior today because it gives you more leverage in the future. That’s the path that Clayton Christensen laid out in The Innovator’s Dilemma.

There is no particular advantage to NoSQL until you reach scales that require it.

Bullshit. Have you even tried modeling an application that felt shoe horned into MySQL in a NoSQL tool? Is “saving a lot of development time” not a particular advantage? What about time consuming schema changes?

Again, I think we need to talk about the best tool for the job, not the best tool for every job. Relational databases are not the best tool for every data storage job.

If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

Seriously? I guess that has a to do with how you value your time. The term that comes to mind here is opportunity cost.

You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier.

They are more proven, but cheaper and easier have a lot to do with your application and your real needs. This strikes me as an over-reaching generalization that doesn’t match reality.



PlanetMySQL Voting: Vote UP / Vote DOWN