Archive for the ‘mycheckpoint’ Category

Things to monitor on MySQL, the user’s perspective

Март 10th, 2010

Working on mycheckpoint, I have the intention of adding custom monitoring. That is, letting the user define things to monitor. I have my own thoughts, I would be grateful to get more input!

What would the user want to monitor?

Monitoring for the number of SELECT statements per second, InnoDB locks, slave replication lag etc. is very important, and monitoring utilities provide with this information. But what does that tell the end user? Not much.

The experienced DBA may gain a lot. The user would be more interested in completely other kind of information. In between, some information is relevant to both.

Say we were managing an on-line store. We want to monitor the health of the database. But the health of the database is inseparable from the health of the application. I mean, having little to no disk usage is fine, unless… something is wrong with the application, which leads to no new purchases.

And so a user would be interested in monitoring the number of purchases per hour, or the time passed since last successful purchase. This kind of data can only be generated by a user’s specific query. Looking at the charts, the user would then feel safer and confident in the wellness of his store app.

But let’s dig further. We want the store’s website to provide with good response. In particular, the query which returns the items in a customer’s cart must react quickly. Our user would not only want to see that purchases get along, but also that page load times (as in our example) are quick for those critical parts. And so a user should be able to monitor the time it took to execute a given query.

It can be of further interest to know how many times per second a given query is executed. This part is not easily done on the server side, and requires the user’s cooperation (or else we must analyze the general log, sniff, or set up a proxy). If the user is willing, she can log to some table each time she executes a certain query. Then we’re back to monitoring a regular table, as with the first example.

It is also possible to monitor for a query’s execution plan. Is it full scan? How many rows are expected? But given that we can monitor the time it took to execute a query, I’m not sure this is useful. If everything runs fast enough — who cares about how it executes?

Some of the above can be monitored on an altogether higher level: if  we’re talking about some web application, then we can use our Apache logs to determine load time for pages, or number of requests to our “cart items” page. But not always do we work with web servers, and we may be interested in checking the specific queries behind the scenes.

Summary

Custom monitoring can include:

  • User defined queries (number of concurrent visitors; count of successful operations per second; number of rows per given table or condition; …)
  • Execution time for user defined queries (time it takes to return cart items; find rows matching condition; sort a table; …)
  • Number of executions for a given query, per second.

I intend to incorporate the above into mycheckpoint as part of its standard monitoring scheme.

Please share your thought below.


PlanetMySQL Voting: Vote UP / Vote DOWN

Static charts vs. interactive charts

Март 2nd, 2010

I’m having my usual fun with charts. Working on mycheckpoint, I’ve generated monitoring charts using the Google Chars API. But I’ve also had chance to experiment and deploy interactive charts, JavaScript based. In particular, I used and tweaked dygraphs.

I’d like to note some differences in using charts of both kinds. And I think it makes a very big difference.

Static charts

I’ll call any image-based chart by “static chart”. It’s just a static image. Example of such charts are those generated by Google Image Charts (they now also have new, interactive charts), or RRDtool. Show below is an example of a static chart; in this example, generated by Google:

Pros and cons of static charts

Pros

  • Images can be viewed on any graphical platform. Browsers, email clients, cell phones, whatever.
  • Self contained: chart image, legend, scales: all in one image.
  • As such, easy to move around.
  • Are safe to use.

Cons

  • Images are fuzzy. Is the com_replace_psec really 0? Maybe it’s 0.1? A larger value can make lower values hard to tell.
  • Images are inaccurate: the colors can lie. The red and green lines showing are hard to tell apart. The red is painted above the green. Data gets “lost”.
  • They do not zoom (one needs to regenerate larger image)
  • Unless encoded with base64, HTML pages which include images need to link outside.
  • In the particular case of Google Charts, one is limited to 2K length URL. Trust me, it’s a big limitation! (PS, Google now support POST method to allow for up to 16K. But… it’s a POST method…)
  • In the particular case of Google Charts, one must have an internet connection.
  • In the particular case of Google Charts, one must submit data to Google.

Interactive charts

Interactive charts are those which react to your commands. These are either JavaScript or Flash based, mostly. They allow for really nice features. Take the following chart as an example: try and move over with your mouse; or select sections to zoom in.


DML

[graphDiv]
[labelsDiv]

The above chart is generated with dygraphs. Since it is embedded within my Wordpress page, the layout is affected by that of my theme. Take a look at this example page to see similar charts outside this blog site (Internet Explorer users: Maxmimize/minimize button will not work well for now. And, may I suggest Mozilla Firefox?)

Pros and cons of interactive charts

Pros

  • Can present you with exact values. No more doubt about the com_replace_psec values.
  • Can allow for zoom in, zoom out.

Cons

  • Need supporting platform. The above cannot be viewed by non-JavaScript browsers (cell phones, etc.)
  • Browser support is also an issue with JavaScript.
  • Emailing such report will result in mail blocking in many companies: mail filters will not allow for JavaScript code to pass.
  • Charts are not necessarily self-contained, in terms of the chart entity With Flash charts (e.g. Fusion Charts) this works. But in the above, the legend and scales are outside the image. As such, they cannot be just moved around.
  • HTML pages which include such charts can be self contained. The HTML page can include all the JavaScript dependencies, in addition to the chart generating code. Flash based charts cannot be self contained.

Summary

Interactive charts are cool!

I’m now integrating dygraphs into mycheckpoint (How nice it is to work with BSD & MIT licenses!). Though I may later switch to flot, interactive charts will be the next standard charting way in mycheckpoint. I will continue supporting static Google Charts, as follows from the above pros and cons list.


PlanetMySQL Voting: Vote UP / Vote DOWN

mycheckpoint (rev. 88): mount points monitoring, improved charting, enhanced auto-deploy

Февраль 10th, 2010

Revision #88 of mycheckpoint is released. In this revision:

  • Disk space monitoring
  • Improved charting
  • Enhanced auto-deploy

Disk space monitoring

mycheckpoint now monitors (on Linux only) three mount points:

  1. The “/” (root) mount point
  2. The datadir mount point
  3. The tmpdir mount point

It may well be the case that two of the above (or perhaps all three of them) share the same mount point. For example, if there isn’t any particular partition for “/tmp“, it is possible that the tmpdir (by default “/tmp“) is on the same mount point as “/“. mycheckpoint does not care.

mycheckpoint monitors and reports the mount point’s used percent, in a similar algorithm df uses.

Disk space monitoring is only possible when monitoring the local machine (i.e. mycheckpoint runs on the same machine as the monitored MySQL server). In the future mycheckpoint may also monitor additional mount points, such as the various logs mount point.

Improved charting

There has been some extensive work to turn the charts into real time-series based. Google charts does not support time series charts; when it will, the required URL length would probably be too long to be used. Some SQL tweaks made it possible to display the charts in correct time-scale even if sampling is taken on non constant interval (or fail to be taken for long periods).

For more examples see the link for HTML brief reports sample, below.

I will write more on SQL Google charts generation in the future.

Enhanced auto-deploy

mycheckpoint will now detect changes to the MySQL version, in addition to changes in mycheckpoint’s version itself. This means there’s no need in ever worrying about upgrades to either one of these components. Just use mycheckpoint to take another sample; it will auto-detect if the MySQL version is different, and start sampling all those new variables introduced in the new version (or stop sampling variables no longer used). It works both for MySQL upgrades and downgrades.

Enhanced localhost detection

To determine whether it is monitoring the local host, mycheckpoint now considers the hostname for the monitored server, and sees if it is either ‘127.0.0.1′, ‘localhost’, or the machine’s hostname or fully qualified hostname.domainname (these last two additions apply for Unix based machines, and have only been tested on Linux so far).

HTML brief reports

Getting a full HTML report is time consuming. I’ve had requests (though not officially submitted through the Issues mechanism) to make it shorter. This is as yet a difficult job. There’s just too much data to aggregate (up to ~180 days of every-5-minute-samples, in a common scenario).

HTML brief reports were introduced in previous versions, and have now been enhanced to include more data. These only present last 24 hours data, and load fast. See HTML brief report sample.

On the press

Not so new by now (it’s two months old), I’m very happy that mycheckpoint has been noted by Jeremy Zawodny in his “My Top Resources of 2009” column on Linux Magazine.

Future plans

Immediate plans for mycheckpoint are:

  • Email alerts notifications; this will allow mycheckpoint to become a real monitoring solution. Following the concept of “SQL oritented monitoring“, these will be SQL based as well.
  • Custom monitoring: allowing user defined queries to be recorded by mycheckpoint; these can then participate in alerts monitoring. This will allow for easy email notifications on program-level errors.

PlanetMySQL Voting: Vote UP / Vote DOWN

mycheckpoint rev. 76: OS monitoring, auto deploy, brief HTML and 24/7 reports

Январь 5th, 2010

Revision 76 of mycheckpoint comes with quite a few improvements, including:

  • OS monitoring (CPU, load average, memory)
  • Auto-deploy
  • Improved charting
  • Brief HTML reports
  • 24/7 charts

OS Monitoring

When monitoring the local machine, mycheckpoint now monitors CPU utilization, load average, memory and swap space.

This only applies to the Linux operating system; there is currently no plan to work this out for other operating systems.

Examples:

mysql> SELECT os_cpu_utilization_percent FROM sv_report_chart_sample;

mycheckpoint-chart-cpu-sample
mysql> SELECT ts, os_loadavg FROM mycheckpoint.sv_report_sample;
+---------------------+------------+
| 2009-12-27 11:45:01 |       1.78 |
| 2009-12-27 11:50:01 |       2.48 |
| 2009-12-27 11:55:01 |       2.35 |
...
+---------------------+------------+
mysql> SELECT report FROM mycheckpoint.sv_report_human_sample ORDER BY id DESC LIMIT 1 \G
*************************** 1. row ***************************
report:
Report period: 2009-12-27 13:20:01 to 2009-12-27 13:25:01. Period is 5 minutes (0.08 hours)
Uptime: 100.0% (Up: 334 days, 06:37:28 hours)

OS:
 Load average: 1.67
 CPU utilization: 25.2%
 Memory: 7486.4MB used out of 7985.6484MB (Active: 6685.8906MB)
 Swap: 3835.2MB used out of 8189.3750MB
...

Auto-deploy

mycheckpoint now has a version recognition mechanism. There is no need to call mycheckpoint with the “deploy” argument on first install or after upgrade. mycheckpoint will recognize a change of version and will auto-deploy before moving on to monitoring your system.

It is still possible, though, to use “deploy“, in case you just want to make sure an upgrade takes place, without issuing a monitoring action.

Improved charting

Further improvements and bug fixes made to the Google charts, including the implementation of missing values charting.

Brief HTML report

In contrast with the full blown HTML report (see sample), which presents hourly/daily/weekly reports for the many metrics, the new brief report only presents with a few hourly based charts. These include InnoDB performance, DML, OS metrics, and replication status.

To get a brief HTML report, issue:

mysql> SELECT html FROM sv_report_html_brief;


See sample brief HTML report.

24/7 charts

24/7 charts present the various metrics on a 24×7 matrix, which allows for diagnostics of usage throughout the day and week. For example, it makes it easier to see how things slow down on Saturday/Sunday; how load increases on 10:00am every day, etc.

24/7 charts are provided by the sv_report_chart_24_7 view.

DESC sv_report_chart_24_7;
+---------------------------------------+----------+------+-----+---------+-------+
| Field                                 | Type     | Null | Key | Default | Extra |
+---------------------------------------+----------+------+-----+---------+-------+
| innodb_read_hit_percent               | longblob | YES  |     | NULL    |       |
| innodb_buffer_pool_reads_psec         | longblob | YES  |     | NULL    |       |
| innodb_buffer_pool_pages_flushed_psec | longblob | YES  |     | NULL    |       |
| innodb_os_log_written_psec            | longblob | YES  |     | NULL    |       |
| innodb_row_lock_waits_psec            | longblob | YES  |     | NULL    |       |
| mega_bytes_sent_psec                  | longblob | YES  |     | NULL    |       |
| mega_bytes_received_psec              | longblob | YES  |     | NULL    |       |
| key_read_hit_percent                  | longblob | YES  |     | NULL    |       |
| key_write_hit_percent                 | longblob | YES  |     | NULL    |       |
| com_select_psec                       | longblob | YES  |     | NULL    |       |
| com_insert_psec                       | longblob | YES  |     | NULL    |       |
| com_delete_psec                       | longblob | YES  |     | NULL    |       |
| com_update_psec                       | longblob | YES  |     | NULL    |       |
| com_replace_psec                      | longblob | YES  |     | NULL    |       |
| com_set_option_percent                | longblob | YES  |     | NULL    |       |
| com_commit_percent                    | longblob | YES  |     | NULL    |       |
| slow_queries_percent                  | longblob | YES  |     | NULL    |       |
| select_scan_psec                      | longblob | YES  |     | NULL    |       |
| select_full_join_psec                 | longblob | YES  |     | NULL    |       |
| select_range_psec                     | longblob | YES  |     | NULL    |       |
| table_locks_waited_psec               | longblob | YES  |     | NULL    |       |
| opened_tables_psec                    | longblob | YES  |     | NULL    |       |
| created_tmp_tables_psec               | longblob | YES  |     | NULL    |       |
| created_tmp_disk_tables_psec          | longblob | YES  |     | NULL    |       |
| connections_psec                      | longblob | YES  |     | NULL    |       |
| aborted_connects_psec                 | longblob | YES  |     | NULL    |       |
| threads_created_psec                  | longblob | YES  |     | NULL    |       |
| seconds_behind_master                 | longblob | YES  |     | NULL    |       |
| os_loadavg                            | longblob | YES  |     | NULL    |       |
| os_cpu_utilization_percent            | longblob | YES  |     | NULL    |       |
| os_mem_used_mb                        | longblob | YES  |     | NULL    |       |
| os_mem_active_mb                      | longblob | YES  |     | NULL    |       |
| os_swap_used_mb                       | longblob | YES  |     | NULL    |       |
+---------------------------------------+----------+------+-----+---------+-------

Example:

mysql> SELECT com_select_psec, innodb_buffer_pool_pages_flushed_psec FROM mycheckpoint.sv_report_chart_24_7 \G
mycheckpoint-chart-247-sample

Trying mycheckpoint

Future plans

I haven’t got any major immediate issues; planning on user customization of charts and HTML reports. Considering thresholds and alerting for the future.


PlanetMySQL Voting: Vote UP / Vote DOWN