Adding a UCA Collation to a Unicode Character Set
813
+----------------+---------+----+---------+----------+---------+
10.4.4. Adding a UCA Collation to a Unicode Character Set
This section describes how to add a UCA collation for a Unicode character set by writing the
<collation>
element within a
<charset>
character set description in the MySQL
Index.xml
file.
The procedure described here does not require recompiling MySQL. It uses a subset of the Locale
Data Markup Language (LDML) specification, which is available at
http://www.unicode.org/reports/tr35/
.
In 5.0, this method of adding collations is supported as of MySQL 5.0.46. With this method, you need
not define the entire collation. Instead, you begin with an existing “base” collation and describe the new
collation in terms of how it differs from the base collation. The following table lists the base collations of
the Unicode character sets for which UCA collations can be defined.
Table 10.1. MySQL Character Sets Available for User-Defined UCA Collations
Character Set
Base Collation
utf8
utf8_unicode_ci
ucs2
ucs2_unicode_ci
The following sections show how to add a collation that is defined using LDML syntax, and provide a
summary of LDML rules supported in MySQL.
10.4.4.1. Defining a UCA Collation using LDML Syntax
To add a UCA collation for a Unicode character set without recompiling MySQL, use the
following procedure. If you are unfamiliar with the LDML rules used to describe the collation's sort
characteristics, see
Section 10.4.4.2, “LDML Syntax Supported in MySQL”
.
The example adds a collation named
utf8_phone_ci
to the
utf8
character set. The collation is
designed for a scenario involving a Web application for which users post their names and phone
numbers. Phone numbers can be given in very different formats:
+7-12345-67
+7-12-345-67
+7 12 345 67
+7 (12) 345 67
+71234567
The problem raised by dealing with these kinds of values is that the varying permissible formats make
searching for a specific phone number very difficult. The solution is to define a new collation that
reorders punctuation characters, making them ignorable.
1. Choose a collation ID, as shown in
Section 10.4.2, “Choosing a Collation ID”
. The following steps
use an ID of 252.
2. To modify the
Index.xml
configuration file. This file will be located in the directory named by
the
character_sets_dir
[443]
system variable. You can check the variable value as follows,
although the path name might be different on your system:
mysql>
SHOW VARIABLES LIKE 'character_sets_dir';
+--------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------+-----------------------------------------+
| character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
+--------------------+-----------------------------------------+
3. Choose a name for the collation and list it in the
Index.xml
file. In addition, you'll need to provide
the collation ordering rules. Find the
<charset>
element for the character set to which the
collation is being added, and add a
<collation>
element that indicates the collation name and
ID, to associate the name with the ID. Within the
<collation>
element, provide a
<rules>
element containing the ordering rules:
<charset name="utf8">
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...