Archive for the ‘Uncategorized’ Category

running mysql with “memlock”

Декабрь 11th, 2010

The documentation says that we need to run mysql as “root” user for it to be effective. The code also indicates the same thing where it tries to do a getuid call to check if the server is running as root user (sql/mysqld.cc:  if (locked_in_memory && !getuid())). The general rule of thumb is that we don’t want to run mysql or rather any database as root user since that leads to vulnerabilities.

Historically memlock seems to cause issues on some OS flavors where “mlockall” implementation is not stable. This is documented in mysql code with the below warning message.

The "–memlock" argument, which was enabled, uses system calls that are

unreliable and unstable on some operating systems and operating-system

versions (notably, some versions of Linux).  This crash could be due to use

of those buggy OS calls.  You should consider whether you really need the

"–memlock" parameter and/or consult the OS distributer about "mlockall"

bugs

On machines where only the database is hosted, if the memory sizing and parameters are set  optimally we might never want to use memlock.  It might be effective  to use “memlock” in machines where the database co-exists with  the application code and possibility of the database memory buffers getting swapped out is very high. Even on optimally sized standalone database machines due to various reasons (Jeremy has a nice blog post on this), there are chances of the database memory buffers getting swapped.

“memlock” seems to be a good option to pin the database in memory. But given the caveat that the servers needs to run as root user makes this option not usable in production since we don’t want to open up to vulnerabilities. Is there a workaround to use “memlock” in production or rather what would be the recommended way of using it.  



PlanetMySQL Voting: Vote UP / Vote DOWN

Distributed Software Testing

Декабрь 10th, 2010

About me

A word about me first: My name is Daniel Mewes and I just came over to California to work at RethinkDB as an intern for the oncoming months. After having been an undergraduate student of computer science at Saarland University, Germany for the last two years, I am exited to work on an influential real-world project at RethinkDB now. Why RethinkDB? Not only does RethinkDB develop an exciting and novel piece of database technology, RethinkDB also provides the great “startup kind” of work experience.

Software testing

In complex software systems like database management systems, different components have to work together. These components can interact in complex ways, yielding a virtually infinite number of possible states that the overall system can reach. This has consequences for software testing. As bugs in the code might only show up in a small fraction of the possible states, comprehensive testing of the system is essential. Encapsulation of code and data into objects can reduce the number of states that must be considered for any single piece of code. However an extremely large number of states can still remain, especially when considering parallel systems. Reliability requirements for database management systems on the other hand are stringent. Losing or corrupting data due to bugs in the program cannot be tolerated here.

Among other measures, we at RethinkDB ensure the reliability of our software by running extensive tests on a daily basis. The problem with these tests is that they take a lot of time to complete. We recently reached time requirements of more than 24 hours on a decent machine for a single test run. So clearly a single machine is not enough anymore to run the tests. For our daily test runs, we want to get results quickly. Buying more machines is pricey, especially as those machines would be idle during the times at which no tests are run. It also is not very flexible.

Tapping into the endless resources of the cloud to ensure software quality and reliability

Cloud computing provides a more flexible and less pricey way to circumvent the limitations of limited local hardware resources. We decided to use Amazon’s Elastic Compute Cloud (Amazon EC2). If you need the computing power of ten systems, you can get that from EC2 in a matter of minutes. If you need the power of a hundred machines, you can get that in a matter of minutes, too. Basically, Amazon’s EC2 provides you with as much computing power as you need, at just the time that you need it. EC2 allows to dynamically allocate and deallocate virtual compute nodes, which are billed on an hourly basis. Each node can be used like a normal computer. The nodes run Linux (Windows nodes are also available) and are accessible through SSH. So EC2 looked like a promising platform to make our tests finish faster.

EC2 console showing a few nodes

EC2 console showing a few nodes

Our existing test suite already split up the work into independent test scripts. What was missing for utilizing EC2 was an automated mechanism to start and setup a number of EC2 nodes and dispatch the individual tests to these nodes to run in parallel. Setting up a node especially involves the step of installing a current build of RethinkDB together with a number of dependencies on the node’s file system. I wrote a Python script to fulfill exactly these tasks. Our main concern was to improve the overall performance of the testing process as much as possible.

In more detail, our new distributed testing tool works in the following steps:

  • Allocate a number of nodes in Amazon’s EC2.
  • Once all nodes are up and booted, install the current build of RethinkDB on each of them. As the bandwidth of the Internet connection in our office is much lower than what is available to the EC2 nodes, we use SFTP to install RethinkDB on only one of the nodes and then let that node distribute it to all remaining ones.
  • We can now start running tests on the nodes:
    • Pick a test from the list of all individual tests to be run.
    • Find a node which is not currently busy running another test. If no node is available, wait until a node becomes free.
    • Initiate the test on the free node. To do this, we use a wrapper script which we invoke and immediately background on the remote node. The wrapper script takes care of running the actual test and redirecting its output and result into specific files, which we can later retrieve asynchronously.
  • After repeating step 3 for all tests in the list, wait for all nodes to finish their current work.
  • Collect the results of all tests from the different nodes. This works by reading from the files in which our wrapper script has stored the tests’ results.
  • Finally, terminate the allocated nodes in EC2.

To communicate with the compute nodes, I opted for the use of Paramiko, an implementation of SSH2 for Python. Having direct access to the SSH2 protocol from a Python script makes running commands remotely as well as fetching and installing files from/into the remote systems very convenient. For allocating and terminating EC2 nodes, we use Boto, which provides an interface for accessing Amazon’s AWS API from within Python programs.

The results are convincing: Instead of 26 hours on a (fast) local machine, running all of our tests takes only 4 hours when distributed across ten nodes in EC2. By using still more nodes, the time for testing can be lowered even further. This is very useful. Say we just made an important change to our code and want to verify that everything works as it is supposed to. With local test runs, this would mean waiting at least a day, even longer if our testing machine is occupied with an earlier test run. If one of the test detects a problem with the change and we fix it, it takes another day at least until we can see if the fix even worked and had no other side effects. Thanks to cloud computing and our distributed testing system, we can now initiate an arbitrary number of test runs on demand, each of which finishes in a matter of mere hours.


PlanetMySQL Voting: Vote UP / Vote DOWN

Cache pre-loading on mysqld startup

Декабрь 1st, 2010

The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.

To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql

SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT('SELECT COUNT(',column_name,') FROM ',table_schema,'.',table_name,' FORCE INDEX (',index_name,')') SEPARATOR ' UNION ALL ') INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN ('information_schema','mysql') AND seq_in_index = 1;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET SESSION group_concat_max_len=@@group_concat_max_len;

and in my.cnf add a line in the [mysqld] block

init-file = /var/lib/mysql/initfile.sql

That’s all. mysql reads that file on startup and executes each line. Since we can do the whole select in a single (admittedly quirky) query and then use dynamic SQL to execute the result, we don’t need to create a stored procedure.

Of course this kind of simplistic “get everything” only really makes sense if the entire dataset+indexes fit in memory, otherwise you’ll want to be more selective. Still, you could use the above as a basis, perhaps using another table to provide a list of tables/indexes to be excluded – or if the schema is really stable, simply have a list of tables/indexes to be included instead of dynamically using information_schema.

Practical (albeit niche) application:

In a system with multiple slaves, adding in a new slave makes it start with cold caches, but since with loadbalancing it will pick up only some of the load it often works out ok. However, some environments have dual masters but the application is not able to do read/write splits to utilise slaves. In that case all the reads also go to the active master. Consequentially, the passive master will have relatively cold caches (only rows/indexes that have been updated will be in memory) so in case of a failover the amount of disk reads for the many concurrent SELECT queries will go through the roof – temporarily slowing the effective performance to a dismal crawl: each query takes longer with the required additional disk access so depending on the setup the server may even run out of connections which in turn upsets the application servers. It’d sort itself out but a) it looks very bad on the frontend and b) it may take a number of minutes.

The above construct prevents that scenario, and as mentioned it can be used as a basis to deal with other situations. Not many people know about the init-file option, so this is a nice example.

If you want to know how the SQL works, read on. The original line is very long so I’ll reprint it below with some reformatting:

SELECT GROUP_CONCAT(CONCAT(
  'SELECT COUNT(',column_name,')
          FROM ',table_schema,'.',table_name,
          ' FORCE INDEX (',index_name,')'
       ) SEPARATOR ' UNION ALL ')
  INTO @sql
  FROM information_schema.statistics
  WHERE table_schema NOT IN ('information_schema','mysql')
  AND seq_in_index = 1;

The outer query grabs each regular db/table/index/firstcol name that exists in the server, writing out a SELECT query that counts all not-NULL values of the indexed column (so it must scan the index), forcing that specific index. We then abuse the versatile and flexible GROUP_CONCAT() function to glue all those SELECTs together, with “UNION ALL” inbetween. The result is a single very long string, so we need to tweak the maximum allowed group_concat output beforehand to prevent truncation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Fixing data warehousing queries with group-by

Ноябрь 29th, 2010

With the standard data warehousing queries we have a fact table and dimension tables and we join them.
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous Loose index scan vs. covered indexes in MySQL post:

    CREATE TABLE `ontime_2010` (
      `YearD` int(11) DEFAULT NULL,
      `MonthD` tinyint(4) DEFAULT NULL,
      `DayofMonth` tinyint(4) DEFAULT NULL,
      `DayOfWeek` tinyint(4) DEFAULT NULL,
      `Carrier` char(2) DEFAULT NULL,
      `Origin` char(5) DEFAULT NULL,
      `DepDelayMinutes` int(11) DEFAULT NULL,
      `AirlineID` int(11) DEFAULT NULL,
      `Cancelled` tinyint(4) DEFAULT NULL,
    ... more fields here ...
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

(this is not the best possible fact table as the data is not aggregated by I’ll use it for now).

And we have those dimensions tables:

 CREATE TABLE `airlines` (
  `AirlineID` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`AirlineID`),
  KEY `AirlineName` (`AirlineName`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `date_dayofweek` (
  `code` int(11) NOT NULL DEFAULT '0',
  `description` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`code`),
  KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from date_dayofweek order by code;
+------+-------------+
| code | description |
+------+-------------+
|    1 | Monday      |
|    2 | Tuesday     |
|    3 | Wednesday   |
|    4 | Thursday    |
|    5 | Friday      |
|    6 | Saturday    |
|    7 | Sunday      |
|    9 | Unknown     |
+------+-------------+
8 rows in set (0.00 sec)

So here is the example query (find sum of cancelled flights on Sundays for the given airline group by day):

select sum(Cancelled), FlightDate, AirlineName
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G

To fix the query we can add a covered index for ontime_2010, so that all fields for ontime_2010 table will be covered:

alter table ontime_2010 add key cov2(AirlineID, dayofweek, FlightDate, Cancelled);

However we will still have “temporary table and filesort”:


mysql> explain select sum(Cancelled), FlightDate
from ontime_2010 o, date_dayofweek dow, airlines a
where o.dayofweek=dow.code and dow.description = 'Sunday'
and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL'
group by FlightDate order by FlightDate desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dow
         type: ref
possible_keys: PRIMARY,description
          key: description
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ref
possible_keys: PRIMARY,AirlineName
          key: AirlineName
      key_len: 258
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: ontime.a.AirlineID,ontime.dow.code
         rows: 24417
        Extra: Using where; Using index
3 rows in set (0.00 sec)
To avoid filesort we can re-write this query with "subqueries":
mysql> explain select sum(Cancelled), FlightDate  from ontime_2010 o
where o.dayofweek= (select code from date_dayofweek where description = 'Sunday')
and AirlineID = (select AirlineID from airlines where AirlineName = 'Delta Air Lines Inc.: DL')
group by FlightDate limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: o
         type: ref
possible_keys: DayOfWeek,covered,AirlineID,cov2
          key: cov2
      key_len: 7
          ref: const,const
         rows: 152510
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 3
  select_type: SUBQUERY
        table: airlines
         type: ref
possible_keys: AirlineName
          key: AirlineName
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: SUBQUERY
        table: date_dayofweek
         type: ref
possible_keys: description
          key: description
      key_len: 258
          ref:
         rows: 1
        Extra: Using where; Using index
3 rows in set (0.00 sec)

As MySQL will use indexes when we have "field = (select .. )" and now all fields in the index belong to the single table, MySQL will use index and avoid filesort. Please note: this will not work with "field in (select ...)" and also make sure that the subselect part will return only 1 row.


PlanetMySQL Voting: Vote UP / Vote DOWN

YPDNGG: You Probably Don’t Need Golden Gate

Ноябрь 23rd, 2010

Before launching into this, I must give due deference to Mogens Nørgaard’s landmark article, You Probably Don’t Need RAC (YPDNR), available here, but originally published Q3 2003 in IOUG Select Journal.  Mogens showed that you can be a friend of Oracle without always agreeing with everything they do.

Suddenly, everyone is asking about Golden Gate.  In July 2009, Oracle bought Golden Gate Software, just one of several companies that have developed log-based replication mechanisms for Oracle and other databases.  This was one of many major acquisitions by Oracle in 2009, including Sun and Relsys. But unlike most of Oracle’s acquisitions, Golden Gate provides very little new functionality not already available in Oracle Streams. Nevertheless, at OpenWorld 2009, Oracle made a shocking announcement.  They declared that Golden Gate would be the primary replication channel for Oracle, and that development would cease on Streams and related components.

Usually, Oracle watches these little third-party products for good ideas, then implements them independently (and better) on their own in the Oracle kernel, then watches as the little third-party companies fizzle out.  A case in point is direct memory access performance sampling.  Precise Software and several other companies in the early 2000s developed  low-impact performance sampling and visualization products for Oracle based on sampling the SGA periodically from an external program.  In version 10g, Oracle answered them with Active Session History (ASH), which did the same thing but better.  Although ASH required customers to purchase the Diagnostic Pack, it still more or less spelled the downfall of competing products.

But in the case of Golden Gate, Oracle already has a log-based replication technology (Streams) built into the kernel, available for a very reasonable price (free with Enterprise Edition).  The only major components that Streams lacks compared to Golden Gate is the ability to replicate across database platforms (Oracle to MSSQL, MySQL, etc. and vice versa).  Even that capability was clearly around the corner: In 11g, Logical Standby (Data Guard), a technology that uses essentially the same stack of components as Streams, gained cross-platform capabilities.

By 11g, Streams has become a mature and stable product, and is far more scalable and configurable than Golden Gate in many ways.  Streams can mine logs on the source or the target, or even a third system.  Depending on the load profile, you can use a wide variety of configuration choices, including parallelism at almost any point.  Streams also allows customers to choose to enforce transaction order or not.

In contrast, Golden Gate’s parallelism is restricted to the apply side, and in parallel mode, does not have the option of guaranteeing transaction order (it is non-ACID). Golden Gate’s parallel apply splits work up by schema, relying on the assumption that interdependent data at the business process level is confined to a single schema at a time.  In other words, if all the tables reside in one schema, then parallel apply doesn’t work, and if they reside in many schemas, the changes in one schema may be applied out of order vis à vis the changes to the other schemas.

Streams is only one of Oracle’s preexisting features that can compete successfully in specific use cases with Golden Gate.  Even more ancient and time-tested solutions such as advanced replication and remote materialized views remain supported and highly effective, depending on the requirement.

If you look at many of the use cases where our customers have deployed Golden Gate, I find that the simplest and most scalable engineering solution would have been remote fast-refresh materialized views.  Our customers often replicate core look-up data, like exchange rates, inventory levels, and other slowly-changing data between Oracle databases within an enterprise.  For this, Golden Gate is completely unjustified, due to cost and complexity compared to remote materialized views.  If it were a question of heterogeneous (inter database product) replication, I completely understand.  But in the majority of situations where we see Golden Gate in use, it is Oracle to Oracle. Given that, I wonder how it could come to pass that responsible people would recommend and implement a solution for such a requirement involving Golden Gate.  Why would Oracle essentially abandon ten years of development and stabilization on a platform like Streams for a less mature, rudimentary product like Golden Gate? Oracle can’t possibly be asking customers to pay additional license fees for a worse version of a product they already own.

So let’s review…

Streams: Mature, complex, requires engineering, highly configurable, scalable, Oracle-only, free.

Golden Gate: Simple, east to deploy, few configuration options, less scalable, expensive, heterogeneous (inter-RDBMS), might break your data.

For me, the corporate direction with regard to Golden Gate is perplexing and smacks of sales-driven (as opposed to requirements and cost-driven) engineering.  I can only imagine what it must be like for the team at Oracle that built Log Miner and AQ into an impressive suite of options including Streams.

Related posts:

  1. New whitepaper: High availability without breaking the bank
  2. Oracle Support and Certification on AWS
  3. A Cloud over San Francisco for OpenWorld 2010


PlanetMySQL Voting: Vote UP / Vote DOWN

Converting queries with OR to Union to ulitize indexes

Ноябрь 23rd, 2010

Lets say we have a table storing mail messages and we need to show user’s mailbox: messages sent “from” and “to” the specified user.

Here is our table:

CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc ...
  PRIMARY KEY (`mail_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And our query:

select * from internalmail
 where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0)
and mail_timestamp > '2010-08-01 12:30:47'
order by mail_timestamp desc 

In this query we show all messages from and to user_id = 247 plus all messages to system user (user_id=0). We need to show only messages for the last 3 months and show the most recent messages first.

To speed up the query we can try creating indexes:

KEY `recipientaddress_id` (`recipientaddress_id`),
KEY `senderaddress_id` (`senderaddress_id`),
KEY `mail_timestamp` (`mail_timestamp`),

However, as the query uses “OR”, MySQL will use a filesort.

mysql> explain select * from internalmail
where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0)
and mail_timestamp > '2010-08-01 12:30:47'
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: internalmail
         type: ALL
possible_keys: recipientaddress_id,senderaddress_id,mail_timestamp
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4843257
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

UPDATE: even if we will create combined indexes on (recipientaddress_id,mail_timestamp) and/or (senderaddress_id,mail_timestamp) those indexes will not be used, as the query contains “OR” in the where clause.

And original query runs for 3 seconds. To fix this query we can do 2 things:

  1. Rewrite query with UNION instead of OR
  2. Create combined indexes

First, we rewrite query with UNION:

(select * from internalmail where senderaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47′)
union
(select * from internalmail where recipientaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47′)
union
(select * from internalmail where recipientaddress_id = 0 and mail_timestamp > ‘2010-08-19 12:30:47′)
order by mail_timestamp desc;

Second, we create 2 indexes:

mysql> alter table internalmail add key send_dt(senderaddress_id, mail_timestamp);
mysql> alter table internalmail add key recieve_dt(recipientaddress_id, mail_timestamp);

After that, MySQL will be able to fully utilize index for each of the 3 queries in union:

mysql> explain
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')
union
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')
union
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > '2010-08-19 12:30:47')
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: internalmail
         type: range
possible_keys: senderaddress_id,mail_timestamp,send_dt
          key: send_dt
      key_len: 9
          ref: NULL
         rows: 5
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 11
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using where
*************************** 4. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using filesort
4 rows in set (0.00 sec)

Although this query has to perform a final filesort it is much faster: now it runs in 0 sec compared to 3 seconds originally.


PlanetMySQL Voting: Vote UP / Vote DOWN

State of the MySQL Community

Ноябрь 10th, 2010

A very interesting presentation by Kay Arnö about the State of the MySQL Community. The initial part of the presentation is about the history of how things happened with MySQL, Sun and Oracle, then it describes the current state of our community. The final part of the presentation is fairly enough promotion towards SkySQL for whom Kay is EVP of Products. I really enjoyed it and would recommend it to all of you interested in past, present and future of “MySQL”.

check it out here


PlanetMySQL Voting: Vote UP / Vote DOWN

State of the MySQL Community

Ноябрь 10th, 2010

A very interesting presentation by Kay Arnö about the State of the MySQL Community. The initial part of the presentation is about the history of how things happened with MySQL, Sun and Oracle, then it describes the current state of our community. The final part of the presentation is fairly enough promotion towards SkySQL for whom Kay is EVP of Products. I really enjoyed it and would recommend it to all of you interested in past, present and future of “MySQL”.

check it out here


PlanetMySQL Voting: Vote UP / Vote DOWN

Hey, DBA, can you run a quick report for me?

Ноябрь 5th, 2010

Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information on it. – Samuel Johnson

Sooner or later, a DBA will be asked for some information from their databases that will need to be well formatted and past the limitations of SQL. The requester will also want to run the report on an ad hoc basis.  And the user can not wait for a script or program to be written. Oh, can you set it up so they can add or drop items of data or change it around on the fly too?

There are a number of report writing programs out there and three with active open source communities are BIRT, Jaspersoft, and Pentaho, among others.  They all roughly do the same thing — take data from your instance and turn it into a good looking report.

Calpont has published three guides to help you get started with these three reporting programs and they are available at here.  Not only will these guides help a DBA learn how to use the programs but they can be compared to see which of the three better fits the organization.



PlanetMySQL Voting: Vote UP / Vote DOWN

Oracle Blamed for Laws of Nature

Ноябрь 5th, 2010

A catchy headline, and I believe more accurate than Oracle Puts the Squeeze on SMBs with MySQL Price Hike (Network World) and MySQL price hikes reveal depth of Oracle’s wallet love [MySQL Jacking up MySQL Prices] (The Register). Slightly more realistic is Oracle kills low-priced MySQL support (again The Register).

First, let’s review what Oracle has actually done: they ditched the MySQL enterprise Basic and Silver offerings. For Oracle, that makes sense. Their intended client base is “enterprise” (high end, think big corporates) and their MySQL sales and cost structure reflects this. It’s not a new thing that came with MySQL at Oracle, because MySQL at Sun Microsystems and MySQL AB before it had the same approach.

A company simply cannot operate below its market – that is not simply a matter of choice, instead it is dictated by their processes and cost structure. Smart people like Clayton Christensen at Harvard Business School have done ample research on this, here I’ll just give one simple example:

If you hire a sales person on commission and their quarterly quota is $100k, then they have to talk with clients that have at least a $10k-$20k potential (qualified leads), and they need to close (sign contract) with at least 10 within the period. They simply cannot spend any time on talking with potential $1k customers.

We may lament this state of affairs, but you can see how, given the choices made (sales person hired, commission system, quota), it’s as inevitable as an apple falling when you drop it. The way I describe this at Upstarta: if a company wants different results, they need to make sure that their business processes and cost structure lead them in that direction. But the simple fact is that most companies don’t have an internal feedback cycle that keeps an eye on these things, so they just go with the flow of consequences of common choices: aim for large(r) clients, grow turnover, get higher operational costs along the way – that in itself is a cycle and the only direction this particular one can go is up. As a natural consequence, over time old low-end offerings and clients need to be jettisoned – one way or another.

I say horay for Oracle to finally acknowledge this, since Sun Microsystems and MySQL AB before it did not (for whatever reason). This is years overdue. Whether the original MySQL company should have aimed to also serve smaller clients also is an entirely separate topic – and one which I covered at length previously (including internally in my time at MySQL AB), but it’s very much a station long passed. Once you float upward in the market, you can’t operate or move downward.

Now, are SMBs using MySQL actually getting squeezed by Oracle? They are not. There is no lock-in. This is about service contracts, not licensing. As we all know, MySQL is GPL licensed and internal use (even on a website or SaaS offering) is well within GPL parameters. There are a number of different companies offering service for MySQL, different types of service and delivery models and a corresponding wide range of pricing. So SMBs and anyone else has a choice, each can pick the type of service most suited to their needs. Let us celebrate and promote that freedom within the MySQL ecosystem, rather than being outraged about dropped apples falling!


PlanetMySQL Voting: Vote UP / Vote DOWN