I just spent about 2 days to convert my old MySQL database from latin1 / latin1_general_ci to UTF-8 character encoding. There are about a gazillion pitfalls, especially when you work on old PHP code mixed with ZF code, on a live site of a client. The first tip is… do NOT work on a live site 😉 Backup everything to a local version or a testserver, and then start to code! Fortunately, I had the site including the database on a local XAMPP environment.

For a coarse orientation, you can use the article on wordpress.org to get an overview. There is no way to change the character encoding of an entire database with just a few keystrokes, so expect it to take a tad bit longer.

For me, the following steps did the trick, using phpMyAdmin and Notepad++ (on Windows):

  1. Create a backup of the entire database with phpMyAdmin.
  2. Change your database to UTF-8 (ALTER DATABASE mydb CHARACTER SET utf8;). This will only affect new tables, so you´re not thru yet.
  3. Select the table(s) to change, and use the export function of phpMyAdmin.
  4. Copy the exported data (INSERT INTO…) without the header (CREATE…) into Notepad++.
  5. Change the table´s charset (ALTER TABLE mytable DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;).
  6. Change all fields with latin charsets to utf8_general_ci (e.g. ALTER TABLE mytable CHANGE myfield myfield TEXT CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;). Pls note that this SQL only works for TEXT fields, refer to the article on wordpress.org on how to convert ENUM, VARCHARS and so on, or use phpMyAdmin to do so for you. If you have a lot of columns with the same format to convert, just copy&paste one SQL line, and change the column names in each. Paste those lines into phpMyAdmin´s SQL editor, that´s a lot faster than doing it for each column.
  7. By now, you should have the table as well as all columns in UTF-8.
  8. Switch to Notepad++, where all you INSERTs for that table are, and convert those lines to UTF-8.
  9. Copy&paste it into phpMyAdmin´s SQL editor.
  10. Repeat steps 3 to 9 for each table, and you should have your database converted to UTF-8.

Now, that took an hour or two ;), but there´s still the website to change to the new encoding.

First, start with your meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Next up, if applicable, change your database connection script in PHP:

mysql_set_charset("utf8");

And / or if you use Zend Framework, edit your config file:

database.params.charset = "utf8"

and your bootstrap:

ini_set('default_charset', 'UTF-8');

$view->setEncoding('UTF-8');

That should do the trick, and your entire site runs on UTF-8 instead of latin.