Archive for the ‘migration’ Category

Latest New Whitepapers in the IOUG On-line Library – 11-Jan-2012

Январь 12th, 2012

Current White Papers for the Oracle Professional

Zero Downtime Migration to Oracle Exadata using Oracle GoldenGate

In this paper we will review Oracle GoldenGate’s capabilities and how it can be used to achieve zero downtime migration and consolidation to Oracle Exadata. We will provide high-level implementation steps for migration with GoldenGate and a customer case study example. You will also find a section that describes other use cases where you can leverage GoldenGate and Oracle Data Integration products for your Oracle Exadata implementations, including Data Warehouse Appliance Migration; Real-Time Data Warehouse Updates; Bulk Data Loading; Maximum Availability; OLTP; and SOA Integration.

Spotlight on: MySQL & Oracle GoldenGate

Oracle GoldenGate is known for its capabilities to offer real-time data integration for heterogeneous databases. This document describes the key use cases where MySQL users can leverage Oracle GoldenGate to achieve higher performance and lower TCO. Solutions featured include real-time data warehousing, operational reporting, and OLTP query offloading.Top 10 Reasons why MySQL

Experts Switch to SchoonerSQL

Learn the top 10 reasons why MySQL experts switch to SchoonerSQL and how it solves the common problems users face with MySQL.

Managing the Rapid Rise in Data Growth: 2011 IOUG Survey on Database Manageability

This report goes one step further – identifying the types of issues database professionals are grappling with in rapidly expanding data environments and then providing some recommended steps for getting ahead of the database management curve from the experts at the Independent Oracle Users Group: applicable industry-wide.



PlanetMySQL Voting: Vote UP / Vote DOWN

Video: Netflix’s migration to AWS cloud

Декабрь 1st, 2010
Found this video regarding Netflix's migration to Amazon's AWS cloud very informative. Enjoy!

PlanetMySQL Voting: Vote UP / Vote DOWN

Cloud Migration Whitepapers

Декабрь 1st, 2010
Amazon's AWS team has published a series of whitepapers covering various scenarios for migrating into AWS cloud infrastructure. Links to these whitepapers are provided below for your convenience:
Cloud Migration
- Migrating applications to the AWS cloud
- Migrating web application
- Migrating batch processing applications
- Migrating backend processing pipelines

PlanetMySQL Voting: Vote UP / Vote DOWN

Why mysqldump is converting my tables from InnoDB to MyISAM?

Ноябрь 13th, 2010

First of all: mysqldump is not converting tables. It is something else. Here is the story:

One of my clients had a case when they were migrating to a new mysql server: they used mysqldump to export data from the old server (all InnoDB) and imported it to the new server. When finished, all the tables became MyISAM on the new server. So they asked me this question:
“Why mysqldump is converting my tables from InnoDB to MyISAM?”

First of all we made sure that the tables are InnoDB on the old server. It was true.
Second we run “show engines” on the new server:

+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO |
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) | NO | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
| FEDERATED | NO | Federated MySQL storage engine | NULL | NULL | NULL |
| ARCHIVE | YES | Archive storage engine | NO | NO | NO |
+------------+---------+----------------------------------------------------------------+--------------+------+------------+


As we see, there is no InnoDB in the list. So, InnoDB was not started.
Next we look into the error log to find out, why InnoDB was not started. And we saw this:

InnoDB: Error: log file ./ib_logfile0 is of different size 0 5242880 bytes
InnoDB: than specified in the .cnf file 0 134217728 bytes!
[ERROR] Plugin 'InnoDB' init function returned error.
[ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
[Note] Event Scheduler: Loaded 0 events
[Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.51-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL)

So, InnoDB was not stared as the size of log files was changed in the my.cnf and the old log files were not moved. Although, mysql server was started, but without InnoDB. In this case mysql restored the tables, but the storage engine was substituted from InnoDB to MyISAM. For example if we create a table with non-existing storage engine, MySQL will use MyISAM instead:


mysql> create table aaa(i int) engine=non_existing_engine;
Query OK, 0 rows affected, 2 warnings (0.16 sec)

mysql> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1286 | Unknown table engine 'non_existing_engine' |
| Warning | 1266 | Using storage engine MyISAM for table 'aaa' |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)

That was happened: mysql used MyISAM instead of InnoDB, produced warnings, but they are usually ignored.

The fix was easy: restart mysql using this instructions (http://dev.mysql.com/doc/refman/5.0/en/adding-and-removing.html) and upload the dump again (or convert myisam to innodb manually).


PlanetMySQL Voting: Vote UP / Vote DOWN

YAPCEU 2010 – Day Two…

Август 6th, 2010

After enjoying the excellent hospitality of our host here in Pisa (6 courses) we were ready for our second day at YAPCEU 2010 here in sunny Pisa.

Larry’s new catch phrase “My Language is a four letter word” was the ‘Buzz word’ for today. We settled down to some very interesting talks, the highlight for me being Tim Bunce’s talk on using Devel::NYTProf to Optimize your code. Tim first gave us a quick and dirty overview of optimization which covered the basics of where to start and what to look for he followed up with real examples of Optimizer output and than wrapped up with a few before and after results on an optimization effort.

The rest of the day was dedicated in my opinion, to the future of DBs in with Nelson Ferraz giving an excellent presentation of his concepts for using Perl as to glue for a Data Warehouse application. Next on my agenda, Martin Berends reports on the present state of Perl 6 and interfaces database. There is progress here as we now have some access to the DBI for MySQL and some others. Also great news is the fact there is a good deal of development work going on.

Martin was quickly followed by Tim Bunce again who presented his proposal for the new Perl 6’s DBDI. Seems
we are going to use the JDBC specification with a little tweaking as our road map for the future. Tim also
showed of some Perl 6 black majik from Jonathan Worthington and he was able to how us DBI with DBD::Pg
running on Perl 6.

Allison Randal finished off today’s formal presentations with her “Migration Strategies” presentation. She
gave us some good insight into migration in that we, as developers, cannot force migration on the community. She also provided two examples of migrations: Apache, which took seven years but has almost full buy-in by users and Python, which was quick and dirty but has not received the same buy-in from the community.

Finally we all enjoyed the lightning talks as our wrap up. A number of neat quick ones such as ‘I speak Perl with a ‘c’ accent’, was a highlight for me.

Seems the videos and most of the slides (including my presentation on XS) may be online tomorrow. I will be sure to post them as soon as I see them.


PlanetMySQL Voting: Vote UP / Vote DOWN

Datetime & Timestamp manipulation / migration explained

Июнь 15th, 2010

Are you doing some datetime manipulation or maybe you are migrating from some database technology to MySQL or possibly using milliseconds?
Here is an example on how to go about it:

Say you have the following date: MAR 16 2008 09:12:51:893AM
SELECT DATE_FORMAT(STR_TO_DATE('MAR 16 2008 09:12:51:893AM','%M %d %Y %h:%i:%s:%f%p'),'%Y%m%d%k%i%s.%f'); --> 2008031691251.893000

What if its PM rather than AM
SELECT DATE_FORMAT(STR_TO_DATE('MAR 16 2008 09:12:51:893PM','%M %d %Y %h:%i:%s:%f%p'),'%Y%m%d%k%i%s.%f'); --> 20080316211251.893000

Ok so this is just simple string manipulation where:
%M is the month name
%d is day number
%Y is the year
%h is the hour
%i is the minute
%s is the second
%f is the microsecond
%p is the period: ante or post meridiem

In the DATE_FORMAT part we se a %k which is in 24hr format in order to loose the period.

A more detailed list is found here

Here is a demo:

mysql Tue Jun 15 12:32:37 2010 > CREATE TABLE test.abc(a DECIMAL(17,3)) ENGINE=MYISAM;
Query OK, 0 rows affected (0.03 sec)

mysql Tue Jun 15 12:32:45 2010 > INSERT INTO abc VALUES ( DATE_FORMAT(STR_TO_DATE('MAR 16 2008 09:12:51:893PM','%M %d %Y %h:%i:%s:%f%p'),'%Y%m%d%k%i%s.%f') );
Query OK, 1 row affected (0.01 sec)

mysql Tue Jun 15 12:32:51 2010 > SELECT * FROM abc;
+--------------------+
| a                  |
+--------------------+
| 20080316211251.893 |
+--------------------+
1 row in set (0.00 sec)

mysql Tue Jun 15 12:32:56 2010 > SELECT TIMESTAMP(a) FROM abc;
+----------------------------+
| TIMESTAMP(a)               |
+----------------------------+
| 2008-03-16 21:12:51.893000 |
+----------------------------+
1 row in set (0.00 sec)


PlanetMySQL Voting: Vote UP / Vote DOWN

Migrating MySQL latin1 to utf8 – Character Set Options

Февраль 22nd, 2010

Continuing on from preparation in our MySQL latin1 to utf8 migration let us first understand where MySQL uses character sets. MySQL defines the character set at 4 different levels for the structure of data.

  • Instance
  • Schema
  • Table
  • Column

In MySQL 5.1, the default character set is latin1. If not specified, this is what you will get. For example.

mysql> create table test1(c1 varchar(10) not null);
mysql> show create table test1\G
Create Table: CREATE TABLE `test1` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

If you want all tables in your instance to always be a default of utf8, you can changed the server variable character_set_server. This can be set dynamically.

mysql> set global character_set_server=utf8;
mysql> set session character_set_server=utf8;
mysql> create table test2(c1 varchar(10) not null);
mysql> show create table test2\G
Create Table: CREATE TABLE `test2` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

If you change this dynamically be sure to include the option in your my.cnf to ensure this option is persisted for a mysqld restart.

You can define the default character set for all new tables in a given schema. You specify this when you create the schema.

mysql> set global character_set_server=latin1;
mysql> set session character_set_server=latin1;
mysql> create schema test_ucs2 default character set ucs2;
mysql> use test_ucs2;
mysql> create table test3(c1 varchar(10) not null);
mysql> show create table test3\G
Create Table: CREATE TABLE `test3` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=ucs2

Even though we have a schema default, you can always specify the default character set for a given table which overrides any defaults.

mysql> use test_ucs2;
mysql> create table test4_utf8 (c varchar(10) not null) default charset utf8;
mysql> show create table test4_utf8\G
Create Table: CREATE TABLE `test4_utf8` (
  `c` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

And finally, if you really wanted to be specific you can define the character set on a per column level.

mysql> create table test4_utf8_latin1 (c varchar(10) not null, c2 varchar(20) charset latin1) default charset utf8;
mysql> show create table test4_utf8_latin1\G
*************************** 1. row ***************************
       Table: test4_utf8_latin1
Create Table: CREATE TABLE `test4_utf8_latin1` (
  `c` varchar(10) NOT NULL,
  `c2` varchar(20) CHARACTER SET latin1 DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

With great flexibility comes great responsibility. You should have a defined standard for your application that is simple and easy to understand. I am not a proponent of using utf8 for everything, the primary reason why is memory. As part of my consulting I spend a lot of time with clients that have limited resources, e.g. database servers with 2GB or 4GB of RAM. MySQL stores utf8 efficiently on disk, but when this data is stored in memory for internal usage, it automatically uses 3 bytes, when on disk it may only be 1 byte. You can test this by creating a MEMORY table with latin1 and utf8 examples and comparing the difference in size. Is this a serious problem? Well that depends on many factors such as the number of database connections, persistent or not persistent connections, the size of the results etc. While it’s difficult in MySQL to instrument the memory precisely on a per connection basis, prudence should be a consideration for any physical resources, especially RAM.

Now that we understand what’s possible, how can we change our existing latin1 tables in our preparation example?

We could try a simple ALTER TABLE command.

mysql> alter table test_latin1 default charset utf8;
mysql> show create table test_latin1\G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) CHARACTER SET latin1 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

This does not work, because we are only changing the default storage engine of the table. The underlying columns remain the same. If we were to add a new column, it would default to utf8. We can however achieve what we expected with the CONVERT option.

mysql> alter table test_latin1 convert to character set utf8;
mysql> show create table test_latin1\G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

We look at our data, and it looks great? Are we done with our conversion?

mysql> select * from test_latin1;
+---------------+
| c             |
+---------------+
| a             |
| abc           |
| ☺           |
| abc ☺☹☻ |
+---------------+

The answer is no. While it may look like the data is correct, MySQL also manages character sets for the communication channel. In this case, we are still communicating in latin1. To ensure moving forward in the future we must always communicate in utf8 to ensure we correctly pass utf8 to the database. We can test this with the mysql client, and as you will see our data is still corrupt.

mysql> set names utf8;
mysql> select * from test_latin1;
+------------------------+
| c                      |
+------------------------+
| a                      |
| abc                    |
| ☺                 |
| abc ☺☹☻ |
+------------------------+

mysql> show session variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

While you can see how we could migrate the schema definition, this does not complete our migration. In my next post, I will discuss the various different ways to correctly perform a data migration between latin1 and utf8.


PlanetMySQL Voting: Vote UP / Vote DOWN

Migrating MySQL latin1 to utf8 – Preparation

Февраль 11th, 2010

Before undertaking such migration the first step is a lesson in understanding more about how latin1 and utf8 work and interact in MySQL. latin1 in a common and historical character set used in MySQL. utf8 (first available in MySQL Version 4.1) is an encoding supporting multiple bytes and is the system default in MySQL 5.0

  • latin1 is a single byte character set.
  • utf8 is a 1-3 byte character set depending on the size of the character. NOTE: MySQL utf8 does not support the RFC 3629 4 byte sequences

MySQL variables

MySQL has a number of different system variables to consider, the following is the default representation in MySQL 5.1

mysql> show global variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

MySQL enables you to specify the character set for data at the Instance,Schema,Table and Column level. This is important because this complexity, especially between the Table and Columns can catch you out.

It is also important to ensure that not only is data stored in the appropriate format, the communication between client and server also needs to support the character set you wish to use.

latin1 example

We start by creating a simple table, inserting some data, and reviewing the data.

mysql> create table test_latin1(c varchar(100) not null) default charset latin1;
mysql> insert into test_latin1(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| ☺             |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

We use the LENGTH() and CHAR_LENGTH() functions to confirm the known length of the string and the true number of characters.

In the above examples, the smiley characters can be reproduced in a web page with the following.

☺☹☻

utf8 example

In a separate session (because we re-use these later) we repeat for utf8.

mysql> set  names utf8;
mysql> show session variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

mysql> create table test_utf8(c varchar(100) not null) default charset utf8;
mysql> insert into test_utf8(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺            |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

As you can see, the key difference here is the CHAR_LENGTH() of the utf8 strings differ.

Mixing latin1 with utf8

If we now look at the latin1 data in the utf8 session we see that while the underlying data via LENGTH(),CHAR_LENGTH() and HEX() remains unchanged (because this is the database representation of the data), the actual output presentation is garbled due to the mismatch in the client communication.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_latin1;
+------------------------+-----------+----------------+------------+----------------------------+
| c                      | length(c) | char_length(c) | charset(c) | hex(c)                     |
+------------------------+-----------+----------------+------------+----------------------------+
| a                      |         1 |              1 | latin1     | 61                         |
| abc                    |         3 |              3 | latin1     | 616263                     |
| ☺                    |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻          |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+------------------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

If we look at the utf8 data in latin1, we see a different garbled representation.

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

Conclusion

Armed now with a basic understanding we are ready to consider what approaches we may undertake to migrate this example table, and how we may be able to verify our data.

References

Character Set Support
Connection Character Sets and Collations


PlanetMySQL Voting: Vote UP / Vote DOWN

Migrating MySQL latin1 to utf8 – Preparation

Февраль 11th, 2010

Before undertaking such migration the first step is a lesson in understanding more about how latin1 and utf8 work and interact in MySQL. latin1 in a common and historical character set used in MySQL. utf8 (first available in MySQL Version 4.1) is an encoding supporting multiple bytes and is the system default in MySQL 5.0

  • latin1 is a single byte character set.
  • utf8 is a 1-3 byte character set depending on the size of the character. NOTE: MySQL utf8 does not support the RFC 3629 4 byte sequences

MySQL variables

MySQL has a number of different system variables to consider, the following is the default representation in MySQL 5.1

mysql> show global variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

MySQL enables you to specify the character set for data at the Instance,Schema,Table and Column level. This is important because this complexity, especially between the Table and Columns can catch you out.

It is also important to ensure that not only is data stored in the appropriate format, the communication between client and server also needs to support the character set you wish to use.

latin1 example

We start by creating a simple table, inserting some data, and reviewing the data.

mysql> create table test_latin1(c varchar(100) not null) default charset latin1;
mysql> insert into test_latin1(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| ☺             |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

We use the LENGTH() and CHAR_LENGTH() functions to confirm the known length of the string and the true number of characters.

In the above examples, the smiley characters can be reproduced in a web page with the following.

☺☹☻

utf8 example

In a separate session (because we re-use these later) we repeat for utf8.

mysql> set  names utf8;
mysql> show session variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

mysql> create table test_utf8(c varchar(100) not null) default charset utf8;
mysql> insert into test_utf8(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺            |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

As you can see, the key difference here is the CHAR_LENGTH() of the utf8 strings differ.

Mixing latin1 with utf8

If we now look at the latin1 data in the utf8 session we see that while the underlying data via LENGTH(),CHAR_LENGTH() and HEX() remains unchanged (because this is the database representation of the data), the actual output presentation is garbled due to the mismatch in the client communication.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_latin1;
+------------------------+-----------+----------------+------------+----------------------------+
| c                      | length(c) | char_length(c) | charset(c) | hex(c)                     |
+------------------------+-----------+----------------+------------+----------------------------+
| a                      |         1 |              1 | latin1     | 61                         |
| abc                    |         3 |              3 | latin1     | 616263                     |
| ☺                    |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻          |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+------------------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

If we look at the utf8 data in latin1, we see a different garbled representation.

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

Conclusion

Armed now with a basic understanding we are ready to consider what approaches we may undertake to migrate this example table, and how we may be able to verify our data.

References

Character Set Support
Connection Character Sets and Collations


PlanetMySQL Voting: Vote UP / Vote DOWN