Chapter 132. The Character Sets and Collations that MySQL Supports

Table of Contents

The Unicode Character Sets
Platform Specific Character Sets
Character Sets for South Europe and Middle East
The Asian Character Sets
The Baltic Character Sets
The Cyrillic Character Sets
The Central European Character Sets
The West European Character Sets

Here is an annotated list of character sets and collations that MySQL supports. Because options and installation settings differ, some sites will not have all items in the list, and some sites will have items that are not on the list because defining new character sets or collations is straightforward.

MySQL supports 70+ collations for 30+ character sets.

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | ISO 8859-1 West European    | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| cp1251   | Windows Cyrillic            | cp1251_bulgarian_ci |      1 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                 | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
| macce    | Mac Central European        | macce_general_ci    |      1 |
| macroman | Mac West European           | macroman_general_ci |      1 |
| cp852    | DOS Central European        | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
| binary   | Binary pseudo charset       | binary              |      1 |
+----------+-----------------------------+---------------------+--------+
33 rows in set (0.01 sec)

NB: ALL CHARACTER SETS HAVE A BINARY COLLATION. WE HAVE NOT INCLUDED THE BINARY COLLATION IN ALL THE DESCRIPTIONS THAT FOLLOW.

The Unicode Character Sets

There are our two Unicode character sets. You can store texts in about 650 languages using these character sets. We have not added a large number of collations for these two new sets yet, but that will be happening soon. Now they have default case-insensitive accent-insensitive collations, plus the binary collation.

 +---------+-----------------+-------------------+--------+
 | Charset | Description     | Default collation | Maxlen |
 +---------+-----------------+-------------------+--------+
 | utf8    | UTF-8 Unicode   | utf8_general_ci   |      3 |
 | ucs2    | UCS-2 Unicode   | ucs2_general_ci   |      2 |
 +---------+-----------------+-------------------+--------+