Archive for the ‘programming’ Category

NoSQL is What?

Июль 24th, 2011

I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You will have plenty of time to switch to NoSQL as and if it becomes helpful.  Until that time, NoSQL is an expensive distraction you don’t need.

Uhm… WHAT?!

I’ve spent more than a few years using MySQL and have been using some NoSQL systems for the last year or so in a fairly busy environment. And scaling is only one of the considerations that factor into those decisions. Features matter too, you know. I really like MongoDB‘s built-in sharding and replica sets. They kick ass. And Redis is an awesome in-memory data store that goes beyond what something like memcached offers. And being schema-less makes a whole hell of a lot of sense in some applications–probably A LOT of applications.

NoSQL exists for a reason–because they ARE useful to a lot of people. This isn’t some stupid bubble.

And to make switching data stores sound like something that “you will have plenty of time for” is outright nuts. There’s a lot of work involved. More than you probably expect. (Ask me how I know…)

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology.  This creates a greater tax on being able to adopt the technology.  That sounds a lot like what we expect to see in premature optimizations to me.

Gee, let me get this straight. If you’re using newer technology, you’re dealing with less mature tools?

No shit. But that’s how progress works. You make a choice to use something that in inferior today because it gives you more leverage in the future. That’s the path that Clayton Christensen laid out in The Innovator’s Dilemma.

There is no particular advantage to NoSQL until you reach scales that require it.

Bullshit. Have you even tried modeling an application that felt shoe horned into MySQL in a NoSQL tool? Is “saving a lot of development time” not a particular advantage? What about time consuming schema changes?

Again, I think we need to talk about the best tool for the job, not the best tool for every job. Relational databases are not the best tool for every data storage job.

If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

Seriously? I guess that has a to do with how you value your time. The term that comes to mind here is opportunity cost.

You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier.

They are more proven, but cheaper and easier have a lot to do with your application and your real needs. This strikes me as an over-reaching generalization that doesn’t match reality.



PlanetMySQL Voting: Vote UP / Vote DOWN

NoSQL is What?

Июль 24th, 2011

I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You will have plenty of time to switch to NoSQL as and if it becomes helpful.  Until that time, NoSQL is an expensive distraction you don’t need.

Uhm… WHAT?!

I’ve spent more than a few years using MySQL and have been using some NoSQL systems for the last year or so in a fairly busy environment. And scaling is only one of the considerations that factor into those decisions. Features matter too, you know. I really like MongoDB‘s built-in sharding and replica sets. They kick ass. And Redis is an awesome in-memory data store that goes beyond what something like memcached offers. And being schema-less makes a whole hell of a lot of sense in some applications–probably A LOT of applications.

NoSQL exists for a reason–because they ARE useful to a lot of people. This isn’t some stupid bubble.

And to make switching data stores sound like something that “you will have plenty of time for” is outright nuts. There’s a lot of work involved. More than you probably expect. (Ask me how I know…)

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology.  This creates a greater tax on being able to adopt the technology.  That sounds a lot like what we expect to see in premature optimizations to me.

Gee, let me get this straight. If you’re using newer technology, you’re dealing with less mature tools?

No shit. But that’s how progress works. You make a choice to use something that in inferior today because it gives you more leverage in the future. That’s the path that Clayton Christensen laid out in The Innovator’s Dilemma.

There is no particular advantage to NoSQL until you reach scales that require it.

Bullshit. Have you even tried modeling an application that felt shoe horned into MySQL in a NoSQL tool? Is “saving a lot of development time” not a particular advantage? What about time consuming schema changes?

Again, I think we need to talk about the best tool for the job, not the best tool for every job. Relational databases are not the best tool for every data storage job.

If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

Seriously? I guess that has a to do with how you value your time. The term that comes to mind here is opportunity cost.

You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier.

They are more proven, but cheaper and easier have a lot to do with your application and your real needs. This strikes me as an over-reaching generalization that doesn’t match reality.



PlanetMySQL Voting: Vote UP / Vote DOWN

Four short links: 7 June 2011

Июнь 7th, 2011

  1. OMG Text -- a plugin for CSS framework Compass for directional text shadows. (via David Kaneda)
  2. Build a Cheap Bitcoin Mine -- some day it will be revealed that the act of generating a bitcoin token is helping the Russian mafia to crack nuclear missile launch codes and Afghan druglords built the Bitcoin system to destabilize the US dollar.
  3. Polycode -- a free, open-source, cross-platform framework for creative code. You can use it as a C++ API or as a standalone scripting language to get easy and simple access to accelerated 2D and 3D graphics, hardware shaders, sound and network programming, physics engines and more. The core Polycode API is written in C++ and can be used to create portable native applications. Lua interfaces. (via Joshua Schachter)
  4. Flickr Date Design -- interesting thoughts on Flickr's date design. The date your photos was taken is stored in a MySQL datetime technically giving you the ability to label your photo as being taken solidly 800+ years before anything most of us would describe as the invention of photography. Which is a little silly.[...]Fundamentally this split between system activity time, and human editable creation date models a world where the people who use your software do something other then use your software. You have to decide how you feel about admitting that possibility. (via Nelson Minar)


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Community – what do you want in a load testing framework?

Май 10th, 2011

So I’ve been doing a fair number of automated load tests these past six months. Primarily with Sysbench, which is a fine, fine tool. First I started using some simple bash based loop controls to automate my overnight testing, but as usually happens with shell scripts they grew unwieldy and I rewrote them in python. Now I have some flexible and easily configurable code for sysbench based MySQL benchmarking to offer the community. I’ve always been a fan of giving back to such a helpful group of people – you’ll never hear me complain about “my time isn’t free”. So, let me know what you want in an ideal testing environment (from a load testing framework automation standpoint) and I’ll integrate it into my existing framework and then release it via the BSD license. The main goal here is to have a standardized modular framework, based on sysbench, that allows anyone to compare their server performance via repeatable tests. It’s fun to see other people’s benchmarks but it’s often difficult to repeat and compare since most tests aren’t fully documented in their blog posts – this could be a solution to that.

Currently I have the harness doing iterations based on:

  • incrementing (choose a global dynamic variable, ie: sync_binlog=0-1000) system values
  • storage engine vs storage engine for the same workload
  • thread quantity increments for read-only or read+write
  • N-nodes in a cluster workloads with WRR traffic distribution (need to code WLC and others)
  • QPS testing for connection pool vs open/close connection
  • multi-table vs single-table workloads

Outputs available: CSV, XML, JSON for easy integration into any number of the various graphing frameworks available. I’ll probably code up a light weight python http server preloaded with Highcharts and Sparklines so you can see your benchmarks easily without having to roll your own graphs.

Quick now, tell me what you’d like me to code for you!


PlanetMySQL Voting: Vote UP / Vote DOWN

Looking at OpenSuse Build Service and Launchpad PPA (aka: How to build packages for MepSQL?)

Февраль 20th, 2011

This is the first part of many posts in a series of blog posts where I want to document how the MepSQL packages were built. By doing that, I will also end up covering the MariaDB build system (which this is based on), some of BuildBot, Amazon EC2 cloud and packaging DEBs and RPMs just in general, so it could be interesting from many perspectives. In this first part I'll simply scribble some notes about reviewing the OpenSuse Build System, Launchpad PPA service vs using your own servers and automating the builds with BuildBot.

Originally I just wanted to work on some new ideas on the automated build and QA system used by MariaDB. But since leaving Monty Program I didn't have access to any of those servers anymore, so as a first step I had to look into what alternatives there are for building binary packages for many operating systems and hardware platforms. In fact, this was another thing I had wanted to learn more about for a while. For instance Michal Hrušecký uses OpenSuse Build Service to build both MySQL and MariaDB packages for all RPM based distributions in the blink of an eye - I was interested to find out what's behind that magic.

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Fun with Bash: aliases make your live easier… share your favorites

Февраль 10th, 2011

I’ve always been a big fan of having a customized .bashrc file. The one I distribute to all of my servers has aliases for quick commands to save me time on the command line, functions that get work done when aliases are too simplistic, reporting for the server for each cli login, and of course a formatted and colored prompt (for terms that support colors). I also change certain aspects and commands based on the operating system since I’m not always on a redhat box or linux at all. Here’s my bashrc file – maybe you have some fun additions that you’d like to share. What saves you time on the command line?


PlanetMySQL Voting: Vote UP / Vote DOWN

Review: MySQL for Python by Albert Lukaszewski

Январь 23rd, 2011

Packt Publishing recently sent me a copy of MySQL for Python to review and after reading through the book I must say that I’m rather impressed at the variety of topics that the book covers.

It starts off with the basics of setting up MySQL for your testing/development needs by going over several of the common installation and configuration methods. After that it’s a quick intro for connection methods and simple error reporting for connections. The author gives a quick intro to CRUD and how it relates to databases and python before heading into the common tasks of simple queries. I was surprised to see some database profiling discussion; which is rather handy for a new coder or a person new to MySQL. Once the basics of Inserts/Selects/Updates/Deletes are covered, which is a rather quick read, there is a welcome discussion of transactions and commit methods – if you do not read this section and are new to MySQL then believe me, you’re missing a very important topic. Most people will gloss over the basics and head right to the more advanced chapters that feature exception handling, the all too common “the mysql server has gone away” error, date&time functions, aggregate functions, and metadata queries. These chapters were the most interesting to me as they covered some great code for python that I have not yet played around with. Previously I’ve done a lot of work on those topics with perl and php so seeing how they were done in python was a great treat. The code is concise, easy to read, and well explained.

A number of topics cover the time saving solutions that no one should be without. Namely, bulk data inserting, data formatting, row iteration, and CSV parsing. Logging methods for access and changes to the database are also covered, and in the end will save your development cycle a lot of time when you are troubleshooting app-to-db interaction.

Two chapters will be of interest to DBAs in particular, and possibly not as interesting to pure developers, of which these are the Disaster Recovery and MySQL Administration topics. The author covers offline backups as well as online hot backups, two sections that no DBA should be without. The code for this type of work is covered in a decent amount of discussion but, along with the other chapters in the book, the theory and background of the topic is also discussed which gives the new reader an understanding of “why” and not just left with the “how”. The administration section of the book covers user creation and permissions management, along with a bit of background on security involved with that task, and also goes into quite a lot of coverage on web-based GUI administration and command line interaction for admin purposes.

Overall I enjoyed the contents of the book and would recommend taking a look if you are new to Python and MySQL or are even looking for a quick reference to the common tasks of database driven application development. This book does not cover the common ORM database interactions you’re likely to see in an app like Django or Pylons, but it will give you a solid foundation on how python and MySQL interact without an abstraction layer. If you are writing quick admin code or building your own database interaction layer, then this book would do well to be in your collection.

You can find the book at Amazon or directly from Packt.


PlanetMySQL Voting: Vote UP / Vote DOWN

Simple Python: a job queue with threading

Январь 21st, 2011

Every so often you need to use a queue to manage operations in an application. Python makes this very simple. Python also, as I’ve written about before, makes threading very easy to work with. So in this quick program I’ll describe via comments, how to make a simple queue where each job is processed by a thread. Integrating this code to read jobs from a mysql database would be trivial as well; simply replace the “jobs = [..." code with a database call to a row select query.

#!/usr/bin/env python
## DATE: 2011-01-20
## FILE: queue.py
## AUTHOR: Matt Reid
## WEBSITE: http://themattreid.com
from Queue import *
from threading import Thread, Lock

'''this function will process the items in the queue, in serial'''
def processor():
    if queue.empty() == True:
        print "the Queue is empty!"
        sys.exit(1)
    try:
        job = queue.get()
        print "I'm operating on job item: %s"%(job)
        queue.task_done()
    except:
        print "Failed to operate on job"

'''set variables'''
queue = Queue()
threads = 4

'''a list of job items. you would want this to be more advanced,
like reading from a file or database'''
jobs = [ "job1", "job2", "job3" ]

”’iterate over jobs and put each into the queue in sequence”’
for job in jobs:
     print “inserting job into the queue: %s”%(job)
     queue.put(job)

”’start some threads, each one will process one job from the queue”’
for i in range(threads):
     th = Thread(target=processor)
     th.setDaemon(True)
     th.start()

”’wait until all jobs are processed before quitting”’
queue.join()

PlanetMySQL Voting: Vote UP / Vote DOWN

Simple Python: a job queue with threading

Январь 21st, 2011

Every so often you need to use a queue to manage operations in an application. Python makes this very simple. Python also, as I’ve written about before, makes threading very easy to work with. So in this quick program I’ll describe via comments, how to make a simple queue where each job is processed by a thread. Integrating this code to read jobs from a mysql database would be trivial as well; simply replace the “jobs = [..." code with a database call to a row select query.

#!/usr/bin/env python
## DATE: 2011-01-20
## FILE: queue.py
## AUTHOR: Matt Reid
## WEBSITE: http://themattreid.com
from Queue import *
from threading import Thread, Lock

'''this function will process the items in the queue, in serial'''
def processor():
    if queue.empty() == True:
        print "the Queue is empty!"
        sys.exit(1)
    try:
        job = queue.get()
        print "I'm operating on job item: %s"%(job)
        queue.task_done()
    except:
        print "Failed to operate on job"

'''set variables'''
queue = Queue()
threads = 4

'''a list of job items. you would want this to be more advanced,
like reading from a file or database'''
jobs = [ "job1", "job2", "job3" ]

”’iterate over jobs and put each into the queue in sequence”’
for job in jobs:
     print “inserting job into the queue: %s”%(job)
     queue.put(job)

”’start some threads, each one will process one job from the queue”’
for i in range(threads):
     th = Thread(target=processor)
     th.setDaemon(True)
     th.start()

”’wait until all jobs are processed before quitting”’
queue.join()

PlanetMySQL Voting: Vote UP / Vote DOWN

Hosteurope.de refuses to serve mysql clients in the USA

Декабрь 3rd, 2010

So I’m looking for virtual servers in Europe for a new Python+MySQL based application and I found a provider with some good prices (Hosteurope.de). So I select the VM that I desire and go to the order form, but there’s no option for United States of America. I find this odd, so I email their support team to see why I can’t pay for services if I live in the USA. Here is their response: “Unfortunately we cannot accept your order due to internal policies” – What exactly is the internal policy where a company turns down sales specifically because the user is from the USA – I have verifiable credit, a real address, real bank accounts, and a real business. Yet hosteurope.de refuses legitimate business from the USA.

Does anyone know why this anti-USA policy is? And since they clearly don’t want my business, does anyone know of a good (and inexpensive) provider for virtual servers that aren’t anti-america?

—-
Dear Mr Reid

> I would very much like to use your services, however there was no option for
> USA in the country drop down menu when ordering – so I was forced to chose
> Austria. Can you tell me how I can order a virtual server from your company
> while residing in the United States of America? I can pay for a year of
> services in advance. I’m starting a new project and this could turn into a
> large account for your sales team. It would seem odd to turn away perfectly
> good business.
>
> Please let me know as I need to provision a server in Germany as soon as
> possible for my application.

Unfortunately we cannot accept your order due to internal policies. We apologize for the inconvenience.

Kind regards
Mit freundlichen Grüßen
Dominik Antulov


Dominik Antulov
Auftragsmanagement
Abteilung Hosting

E-Mail: support@hosteurope.de
Telefon: 0800 467 8387
Fax: +49 180 5 66 3233 (*)

+++ Nützliche Links
Host Europe FAQ (häufig gestellte Fragen): http://faq.hosteurope.de
Forum “Kunden helfen Kunden”: https://kis.hosteurope.de/forum/


PlanetMySQL Voting: Vote UP / Vote DOWN