MySQL 5.0 FAQ: MySQL Chinese, Japanese, and Korean Character Sets
2893
because
gbk
is a superset of
gb2312
—but eventually they try to insert a rarer Chinese character and it
doesn't work. (See Bug #16072 for an example).
Here, we try to clarify exactly what characters are legitimate in
gb2312
or
gbk
, with reference to the
official documents. Please check these references before reporting
gb2312
or
gbk
bugs.
• For a complete listing of the
gb2312
characters, ordered according to the
gb2312_chinese_ci
collation:
gb2312
• MySQL's
gbk
is in reality “Microsoft code page 936”. This differs from the official
gbk
for characters
A1A4
(middle dot),
A1AA
(em dash),
A6E0-A6F5
, and
A8BB-A8C0
.
• For a listing of
gbk
/Unicode mappings, see
http://www.unicode.org/Public/MAPPINGS/VENDORS/
MICSFT/WINDOWS/CP936.TXT
.
• For MySQL's listing of
gbk
characters, see
gbk
.
B.11.2: I have inserted CJK characters into my table. Why does
SELECT
display them as “?”
characters?
This problem is usually due to a setting in MySQL that doesn't match the settings for the application
program or the operating system. Here are some common steps for correcting these types of issues:
• Be certain of what MySQL version you are using.
Use the statement
SELECT VERSION();
to determine this.
• Make sure that the database is actually using the desired character set.
People often think that the client character set is always the same as either the server character set
or the character set used for display purposes. However, both of these are false assumptions. You
can make sure by checking the result of
SHOW CREATE TABLE tablename
or—better yet—by
using this statement:
SELECT character_set_name, collation_name
FROM information_schema.columns
WHERE table_schema = your_database_name
AND table_name = your_table_name
AND column_name = your_column_name;
• Determine the hexadecimal value of the character or characters that are not being displayed
correctly.
You can obtain this information for a column
column_name
in the table
table_name
using the
following query:
SELECT HEX(
column_name
)
FROM
table_name
;
3F
is the encoding for the
?
character; this means that
?
is the character actually stored in the
column. This most often happens because of a problem converting a particular character from your
client character set to the target character set.
• Make sure that a round trip possible—that is, when you select
literal
(or
_introducer
hexadecimal-value
), you obtain
literal
as a result.
For example, the Japanese Katakana character Pe (
ペ
'
) exists in all CJK character sets, and has
the code point value (hexadecimal coding)
0x30da
. To test a round trip for this character, use this
query:
SELECT '
ペ
' AS `
ペ
`; /* or SELECT _ucs2 0x30da; */
If the result is not also
ペ
, then the round trip has failed.
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...