MySQL 5.0 FAQ: MySQL Chinese, Japanese, and Korean Character Sets
2902
Since the character set appears to be correct, let's see what information the
INFORMATION_SCHEMA.COLUMNS
table can provide about this column:
mysql>
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
->
FROM INFORMATION_SCHEMA.COLUMNS
->
WHERE COLUMN_NAME = 's1'
->
AND TABLE_NAME = 't';
+-------------+--------------------+-----------------+
| COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+-------------+--------------------+-----------------+
| s1 | ucs2 | ucs2_general_ci |
+-------------+--------------------+-----------------+
1 row in set (0.01 sec)
(See
Section 19.4, “The
INFORMATION_SCHEMA COLUMNS
Table”
, for more information.)
You can see that the collation is
ucs2_general_ci
instead of
ucs2_unicode_ci
. The reason why
this is so can be found using
SHOW CHARSET
, as shown here:
mysql>
SHOW CHARSET LIKE 'ucs2%';
+---------+---------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
For
ucs2
and
utf8
, the default collation is “general”. To specify a Unicode collation, use
COLLATE
ucs2_unicode_ci
.
B.11.16: Why are my supplementary characters rejected by MySQL?
Before MySQL 5.5.3, MySQL does not support supplementary characters—that is, characters which
need more than 3 bytes—for
UTF-8
. We support only what Unicode calls the Basic Multilingual Plane /
Plane 0. Only a few very rare Han characters are supplementary; support for them is uncommon. This
has led to reports such as that found in Bug #12600, which we rejected as “not a bug”. With
utf8
,
we must truncate an input string when we encounter bytes that we don't understand. Otherwise, we
wouldn't know how long the bad multi-byte character is.
One possible workaround is to use
ucs2
instead of
utf8
, in which case the “bad” characters are
changed to question marks; however, no truncation takes place. You can also change the data type to
BLOB
or
BINARY
, which perform no validity checking.
As of MySQL 5.5.3, Unicode support is extended to include supplementary characters by means of
additional Unicode character sets:
utf16
,
utf32
, and 4-byte
utf8mb4
. These character sets support
supplementary Unicode characters outside the Basic Multilingual Plane (BMP).
B.11.17: Shouldn't it be “CJKV”?
No. The term “CJKV” (Chinese Japanese Korean Vietnamese) refers to Vietnamese character sets
which contain Han (originally Chinese) characters. MySQL has no plan to support the old Vietnamese
script using Han characters. MySQL does of course support the modern Vietnamese script with
Western characters.
As of MySQL 5.6, there are Vietnamese collations for Unicode character sets, as described in
Section 10.1.13.1, “Unicode Character Sets”
.
B.11.18: Does MySQL allow CJK characters to be used in database and table names?
This issue is fixed in MySQL 5.1, by automatically rewriting the names of the corresponding directories
and files.
For example, if you create a database named
楮
on a server whose operating system does not support
CJK in directory names, MySQL creates a directory named
@0w@00a5@00ae
. which is just a fancy way
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...