La Vita è Bella

2007-01-26

MySQL collation charset problem and patch for NucleusCMS and WikkaWiki

First of all, I must say MySQL sucks!

As from MySQL 4.1, if the database default charset is UTF8 but the default charset setting in my.cnf is not UTF8, you must do a "SET NAMES UTF8" query after connection, to make sure that in this mysql connection, all query will be handled as UTF8. otherwise seems that it will use latin1 as default.

If the "SET NAMES" isn't queried, the articles and comments won't be malformed, but the data that stored in MySQL database isn't actually in UTF8 encoding, so that for example mysqldump, or phpmyadmin, won't get the right data.

So what we need to do is check database's charset setting, and send a "SET NAME" query to ensure we're using the database's charset setting.

I've found that LifeType did this thing, so I copied its detecting code and made patches for NucleusCMS and Wikka Wiki (and also my custom homepage for LifeType, of course!).

But before downloading the patches, I should tell you that:

First, you MUST convert your database. Cause before you patch NucleusCMS or Wikka Wiki, your articles, comments, etc. that already posted, was stored in the old way in the db.

To do that, follow the following steps (this is in case the my.cnf indicate "latin1" but your database was in "utf8", if you're using other charsets, modify them):

  1. mysqldump -h host -u username -ppassword database --default-character-set=latin1 > dump.sql
  2. Edit dump.sql, on Line 10, replace "latin1" with "utf8"
  3. mysql -h host -u username -ppassword database < dump.sql

Second, you must ensure that in your database, ALL charset settings are set to "utf8". The word "ALL" means databases, tables and fields.

Finally, here're the patches: for nucleus, for wikka, released under GPL. WARNING: Use it on your own risk, and make backup before patching.

22:47:17 by fishy - Permanent Link

May the Force be with you. RAmen