Be careful with CHARSET – Indexing and joining in MySQL database

MySQL

Recently, I had to extract data from two data sources in two different databases. Both are using MySQL database. In order to make some queries, I dump those two data databases and then import and combine them into a single database, also using MySQL database. After that, I executed some queries but then the problem araised. There was a query which is taking so much time (few minutes), while it is supposed to complete in no time (few milliseconds). Here is the snipet of my query I have.

SELECT table1.pid, table1.attr1, table2.attr1
  FROM table1
 INNER JOIN table2
    ON table2.fid = table1.pid
 WHERE table1.pid = "one"

Yes, it is a simple SQL. Table table1 and table2 are those previously imported from different databases. Field id is the primary key of table1, where fid is a foreign key in table2 which contains data intended to be derieved from field pid in table1. These fields has the same data type varchar(6). Index has been created on field fid. There are about 27 thousands of rows in table1 and around 5 millions of rows in table2. If equally every record in table1 has the same number of correspondent records in table2, then this query is expected only returns around 180 rows. So, in theory, this query should take no time to complete. But the fact, it was not. Why?

After doing some researches, alone – due to the lack of information of the symptom of the problem I could not find any workaround from the Google –, I have found the the problem. The problem was that character set used in the field fid in table2 differ from one used in field pid in table1. Field pid in table table1 was using Unicode, while field fid in table table2 was using Latin character code. This difference has made the table join not working properly and even there is an index on field fid in table table2, rather than search from the index, the query execution plan has decided to scan for the entire rows of table table2 which has millions of data! So after I fixed this, the query executed in few milliseconds.

So how I found the problem?

In order to see how the query engine executing the query, in MySQL, we can use the statement EXPLAIN like the code below.

EXPLAIN
SELECT table1.pid, table1.attr1, table2.attr1
  FROM table1
 INNER JOIN table2
    ON table2.fid = table1.pid
 WHERE table1.pid = "one"

This will returns the following table.

id select_type table   type   possible_key  key      key_len  ref       rows
 1 SIMPLE      table1  const  PRIMARY       PRIMARY       20  const        1
 1 SIMPLE      table2  ref    fid           fid           21  const      179

The above table is the ideal plan, where you can see the reference key is only scaned for 179 rows which is near equal to the number of correspondent records in table table2. But how is the character set is set differently? Here is the result:

id select_type table   type   possible_key  key      key_len  ref       rows
 1 SIMPLE      table1  const  PRIMARY       PRIMARY       20  const        1
 1 SIMPLE      table2  ref    NULL          NULL        NULL  NULL   4897799

You can see, just because the character set is different, there is no key (index) used, thus the query engine has to scan for the entire rows of table table2. It is the same plan as when there is no index created on field fid in table table2.

Joining on differently set character set of indexed fields will generate the same plan as joining on no-indexed fields.

Character set in MySQL database can be set in database, table, and field, and they are independent.

So be careful when creating database. Play attention at the charater set of the database. Table new created is using this (database) character set. Same as field inside a table is using this (table) character set. But be careful. They are independent. If you change a database’s character set, tables already in that database are still using the previous character set. Same as in a table. If you change a table’s character set, fields already in that table are still using the previous character set. You should also play attention at the default character set in a MySQL instance. You can set the default character set of a MySQL instance by mention that in my.cnf configuration fie.

database. In order to make some queries, I dump those two data databases and then import and combine t hem into a single database, also using MySQL database. After that, I executed some queries but then the problem araised. There was a query which is taking so much time (few minutes), while it is supposed to complete in no time (few milliseconds).

Here is the snipet of my query I have.

SELECT table1.pid, table1.attr1, table2.attr1
  FROM table1
 INNER JOIN table2
    ON table2.fid = table1.pid
 WHERE table1.pid = "one"

Yes, it is a simple SQL. Table table1 and table2 are those previously imported from different databases. Field id is the primary key of table1, where fid is a foreign key in table2 which contains data intended to be derieved from field pid in table1. These fields has the same data type varchar(6). Index has been created on field fid. There are about 27 thousands of rows in table1 and around 5 millions of rows in table2. If equally every record in table1 has the same number of correspondent records in table2, then this query is expected only returns around 180 rows. So, in theory, this query should take no time to complete. But the fact, it was not. Why?

After doing some researches, alone – due to the lack of information of the symptom of the problem I could not find any workaround from the Google –, I have found the the problem. The problem was that character set used in the field fid in table2 differ from one used in field pid in table1. Field pid in table table1 was using Unicode, while field fid in table table2 was using Latin character code. This difference has made the table join not working properly and even there is an index on field fid in table table2, rather than search from the index, the query execution plan has decided to scan for the entire rows of table table2 which has millions of data! So after I fixed this, the query executed in few milliseconds.

So how I found the problem?

In order to see how the query engine executing the query, in MySQL, we can use the statement EXPLAIN like the code below.

EXPLAIN
SELECT table1.pid, table1.attr1, table2.attr1
  FROM table1
 INNER JOIN table2
    ON table2.fid = table1.pid
 WHERE table1.pid = "one"

This will returns the following table.

id select_type table   type   possible_key  key      key_len  ref       rows
 1 SIMPLE      table1  const  PRIMARY       PRIMARY       20  const        1
 1 SIMPLE      table2  ref    fid           fid           21  const      179

The above table is the ideal plan, where you can see the reference key is only scaned for 179 rows which is near equal to the number of correspondent records in table table2. But how is the character set is set differently? Here is the result:

id select_type table   type   possible_key  key      key_len  ref       rows
 1 SIMPLE      table1  const  PRIMARY       PRIMARY       20  const        1
 1 SIMPLE      table2  ref    NULL          NULL        NULL  NULL   4897799

You can see, just because the character set is different, there is no key (index) used, thus the query engine has to scan for the entire rows of table table2. It is the same plan as when there is no index created on field fid in table table2.

Joining on differently set character set of indexed fields will generate the same plan as joining on no-indexed fields.

Character set in MySQL database can be set in database, table, and field, and they are independent.

So be careful when creating database. Play attention at the charater set of the database. Table new created is using this (database) character set. Same as field inside a table is using this (table) character set. But be careful. They are independent. If you change a database’s character set, tables already in that database are still using the previous character set. Same as in a table. If you change a table’s character set, fields already in that table are still using the previous character set. You should also play attention at the default character set in a MySQL instance. You can set the default character set of a MySQL instance by mention that in my.cnf configuration fie.