MySQL 5.0 FAQ: MySQL Chinese, Japanese, and Korean Character Sets
2900
CREATE PROCEDURE p_convert(ucs2_char CHAR(1) CHARACTER SET ucs2)
BEGIN
CREATE TABLE tj
(ucs2 CHAR(1) character set ucs2,
utf8 CHAR(1) character set utf8,
big5 CHAR(1) character set big5,
cp932 CHAR(1) character set cp932,
eucjpms CHAR(1) character set eucjpms,
euckr CHAR(1) character set euckr,
gb2312 CHAR(1) character set gb2312,
gbk CHAR(1) character set gbk,
sjis CHAR(1) character set sjis,
ujis CHAR(1) character set ujis);
INSERT INTO tj (ucs2) VALUES (ucs2_char);
UPDATE tj SET utf8=ucs2,
big5=ucs2,
cp932=ucs2,
eucjpms=ucs2,
euckr=ucs2,
gb2312=ucs2,
gbk=ucs2,
sjis=ucs2,
ujis=ucs2;
/* If there is a conversion problem, UPDATE will produce a warning. */
SELECT hex(ucs2) AS ucs2,
hex(utf8) AS utf8,
hex(big5) AS big5,
hex(cp932) AS cp932,
hex(eucjpms) AS eucjpms,
hex(euckr) AS euckr,
hex(gb2312) AS gb2312,
hex(gbk) AS gbk,
hex(sjis) AS sjis,
hex(ujis) AS ujis
FROM tj;
DROP TABLE tj;
END//
The input can be any single
ucs2
character, or it can be the code point value (hexadecimal
representation) of that character. For example, from Unicode's list of
ucs2
encodings and names
(
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
), we know that the Katakana character Pe
appears in all CJK character sets, and that its code point value is
0x30da
. If we use this value as the
argument to
p_convert()
, the result is as shown here:
mysql>
CALL p_convert(0x30da)//
+------+--------+------+-------+---------+-------+--------+------+------+------+
| ucs2 | utf8 | big5 | cp932 | eucjpms | euckr | gb2312 | gbk | sjis | ujis |
+------+--------+------+-------+---------+-------+--------+------+------+------+
| 30DA | E3839A | C772 | 8379 | A5DA | ABDA | A5DA | A5DA | 8379 | A5DA |
+------+--------+------+-------+---------+-------+--------+------+------+------+
1 row in set (0.04 sec)
Since none of the column values is
3F
—that is, the question mark character (
?
)—we know that every
conversion worked.
B.11.14: Why do CJK strings sort incorrectly in Unicode? (I)
Sometimes people observe that the result of a
utf8_unicode_ci
or
ucs2_unicode_ci
search,
or of an
ORDER BY
sort is not what they think a native would expect. Although we never rule out the
possibility that there is a bug, we have found in the past that many people do not read correctly the
standard table of weights for the Unicode Collation Algorithm. MySQL uses the table found at
http://
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...