Archive for the ‘ideas’ Category

Tuning MySQL Server Settings

Сентябрь 30th, 2010
The default configuration file for MySQL is intended not to use many resources, because its a general purpose sort of a configuration file. The default configuration does enough to have MySQL running happily with limited resources and catering to simple queries and small data-sets. The configuration file would most definitely need to be customized if you intend on using complex queries and when your good amount of data. Getting started Before you start changing configuration settings, you should keep in mind the fact that “more is not always better”. You might have 16-20 GB of RAM, but that doesn’t mean...
PlanetMySQL Voting: Vote UP / Vote DOWN

Databases: Normalization or Denormalization. Which is the better technique?

Август 30th, 2010

This has really been a long debate as to which approach is more performance orientated, normalized databases or denormalized databases. So this article is a step on my part to figure out the right strategy, because neither one of these approaches can be rejected outright. I will start of by discussing the pros and cons of both the approaches.

Pros and Cons of a Normalized database design.

Normalized databases fair very well under conditions where the applications are write-intensive and the write-load is more than the read-load. This is because of the following reasons:

  • Normalized tables are usually smaller and have a smaller foot-print because the data is divided vertically among many tables. This allows them to perform better as they are small enough to get fit into the buffer.
  • The updates are very fast because the data to be updated is located at a single place and there are no duplicates.
  • Similarly the inserts are very fast because the data has to be inserted at a single place and does not have to be duplicated.
  • The selects are fast in cases where data has to be fetched from a single table, because normally normalized tables are small enough to get fit into the buffer.
  • Because the data is not duplicated so there is less need for heavy duty group by or distinct queries.

Although there seems to be much in favor of normalized tables, with all the cons outlined above, but the main cause of concern with fully normalized tables is that normalized data means joins between tables. And this joining means that read operations have to suffer because indexing strategies do not go well with table joins.

Now lets have a look at the pros and cons of a denormalized database design.

Pros and cons of denormalized database design.

Denormalized databases fair well under heavy read-load and when the application is read intensive. This is because of the following reasons:

  • The data is present in the same table so there is no need for any joins, hence the selects are very fast.
  • A single table with all the required data allows much more efficient index usage. If the columns are indexed properly, then results can be filtered and sorted by utilizing the same index. While in the case of a normalized table, since the data would be spread out in different tables, this would not be possible.

Although for reasons mentioned above selects can be very fast on denormalized tables, but because the data is duplicated, the updates and inserts become complex and costly.

Having said that neither one of the approach can be entirely neglected, because a real world application is going to have both read-loads and write-loads. Hence the correct way would be to utilize both the normalized and denormalized approaches depending on situations.

Using normalized and denormalized approaches together.

The most common way of mixing denormalized and normalized approaches is to duplicate related columns from one table into another table. Let me show you by example:

Suppose you have a products table and an orders table.
The normalized approach would be to only have the product_id in the orders table and all the other product related information in the products table.

But that would make the query that filters by product_name and sorts by order_date inefficient because both are stored in different tables.

In a fully normalized schema, such a query would be performed in the following manner:

SELECT product_name, order_date
FROM orders INNER JOIN products USING(product_id)
WHERE product_name like 'A%'
ORDER by order_date DESC

As you can see MySQL here will have to scan the order_date index on the orders table and then compare the corresponding product_name in the products table to see if the name starts with A.

The above query can be drastically improved by denormalizing the schema a little bit, so that the orders table now includes the product_name column as well.

SELECT product_name, order_date
FROM orders
WHERE product_name like 'A%'
ORDER by order_date DESC

See how the query has become much simpler, there is no join now and a single index on columns product_name, order_date can be used to do the filtering as well as the sorting.

So can both the techniques be used together? Yes they can be, because real word applications have a mix of read and write loads.

Final words.

Although, denormalized schema can greatly improve performance under extreme read-loads but the updates and inserts become complex as the data is duplicate and hence has to be updated/inserted in more than one places.

One clean way to go about solving this problem is through the use of triggers. For example in our case where the orders table has the product_name column as well, when the value of product_name has to be updated, then it can simply be done in the following way:

  • Have a trigger setup on the products table that updates the product_name on any update to the products table.
  • Execute the update query on the products table. The data would automatically be updated in the orders table because of the trigger.

However, when denormalizing the schema, do take into consideration, the number of times you would be updating records compared to the number of times you would be executing SELECTs. When mixing normalization and denormalization, focus on denormalizing tables that are read intensive, while tables that are write intensive keep them normalized.


PlanetMySQL Voting: Vote UP / Vote DOWN

How to Brainstorm New Ideas

Май 13th, 2010

I promised in last week's post on "How to Kill Good Ideas" to follow up with some ways that more constructively help create new ideas.  The first of these is taken from an idea by Mats Kindahl's post of two other ways to kill ideas. Without further delay, here they are...

  1. Make it safe to contribute ideas
    The best way to do this is encourage risk taking and acknowledge that some ideas will fail and that's acceptable.  The people I know who are the most creative are also the most prolific when it comes to idea generation.  And some of those ideas are, objectively speaking, total crap.  But there are so many good ideas generated in the process, it really doesn't matter.
  2. Go for quantity
    One of the basic tenets of brainstorming is that you need to generate a lot of ideas.  In order to encourage that, you have to refrain from evaluating ideas during the initial phase.  You simply write every idea down, no matter what you think of it, and then try to generate more ideas. You can always winnow down the ideas later on to chose the best ones to work on. But judging raw ideas as they are suggested is the surest way to kill a brainstorming session. 
  3. Make it a team sport
    Often in meetings there's a tendency to have one person present and others passively watching or worse, critiquing.  That's not a good way to generate ideas.  Instead, it's better to break up into smaller groups and give them a short period of time (20-30 minutes) and ask them to generate ideas.  Then you need to make sure that everyone is contributing.  It's a participation sport folks!  You're not there to be a spectator.  Not only will you generate more and better ideas, people will actually enjoy the meeting and feel that they contributed something.
  4. Cross-Pollinate
    Sometimes if all of the people working on a problem are from the same background you'll run out of ideas.  Call in someone with a different perspective.  When we've done successful brainstorming sessions at offsite meetings, the best ideas come when you mix up groups across disciplines and force people to explore ideas and problems outside of their area of expertise.  Invite the salespeople into a product brainstorming session.  Heck, invite the finance team.  You might get ideas that you'd never get from engineers.
  5. Get down from the mountain
    If you find yourself short of creative ideas maybe you're too isolated.  Get out of the ivory tower, the executive suite, or from behind your computer screen and get out into the real world.  I have found going out to see customers and just asking about their problems is a tremendously useful way to generate ideas.  Conferences are also good; you can see what other people are doing and consider how to apply other ideas to the problems you have.
  6. Consider it as an experiment
    Sometimes when the stakes are very high, it's easy to end up paralyzed.  In those cases, it can be helpful to approach potential solutions as experiments.  You test them out for a period of time and then you'll know whether it works or doesn't.  In most cases you can "undo" the experiment if it doesn't work out. But be sure to know what you'll measure to know if the experiment is a success. It might be product downloads or new customers acquired, but make sure you have some basis for knowing whether the experiment succeeded or not.  And sometimes even if it fails, you'll have learned something you can do differently. 
  7. Take a break
    This is counter-intuitive, but it's sometimes the best way to break through on a tough problem. If you're too entrenched in a problem its sometimes hard to be objective or open to a radical approach.  In those cases, it makes sense to take a break and engage in some other activity.  For me, the best way to come up with ideas is to go out for a run by myself and just see what ideas come to me.  For other people it might be a walk around the office building, a hike, a bike ride or a leisurely drive.  Anything that gets you out of the mode of intense concentration into a more receptive way of thinking will work.
  8. Be optimistic
    Sometimes the only difference between achieving success or failure on a problem is the belief that there is a solution and the willingness to continue to make the effort to strive towards it.  And every failure along the way is just a stepping stone.  Personally, I think it is better to be an optimist in life than a pessimist or even worse, a cynic. Besides, who wants to hang out with a pessimist? 

As before, I've stopped this "Top 10" listing short to encourage others to share their observations on how to come up with creative ideas. 


PlanetMySQL Voting: Vote UP / Vote DOWN

Active Cache for MySQL

Январь 11th, 2010

One of the problems I have with Memcache is this cache is passive, this means it only stores cached data. This means application using Memcache has to has to special logic to handle misses from the cache, being careful updating the cache – you may have multiple data modifications happening at the same time. Finally you have to pay with increased latency constructing the items expired from the cache, while they could have been refreshed in the background. I think all of these problems could be solved with concept of active cache

The idea with Active Cache is very simple – for any data retrieval operation cache would actually know how to construct the object, so you will never get a miss from the cache, unless there is an error. From existing tools this probably lies out best on registering the jobs with Gearman.

The updates of the data in this case should go through the same system so you can get serialization (or other logic) for your data updates.

You could also use the same functions updating the data when it expires. This could be exposed as explicit logic, something like expires in 300 seconds, start refresh in 200 seconds as well as automated.

The logic for automatic handling could be as follows – after the key has expired we can purge its value but keep it in cache with “expired” flag. If we can see for the same key we get a lot of requests when it is expired cache could decide to refresh such keys based on available bandwidth.

Another extension to common caching methods I’d like to see is having max_age specified on GET request. In many applications expiration is not data driven but rather request driven. Consider for example posting the blog comment on this blog. If you’re the user who posted the comment you have to see it instantly to avoid bad experience. At the same time other users can continue reading stale data – if they see comment appearing 10 seconds later they will not have any bad user experience.

Finally I think Active Cache could be very helpful handling write back scenarios. There are many cases when there is a lot of updates happening to the data – counters, last login, scores etc which do not really need to be reflected in the database instantly. If cache itself “knows” how to update the data you could define the policies on how frequently the data object needs to be synced to database.

I’d like to hear some feedback if you think such concept would be helpful for your applications and if you think there are existing tools and technologies which can be used to conveniently build things like this.


Entry posted by peter | One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN