Natural Language Full-Text Searches
937
| 2 | 0 |
| 3 | 0.66266459226608 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
+----+-----------------------------------------+
6 rows in set (0.00 sec)
The following example is more complex. The query returns the relevance values and it also sorts the
rows in order of decreasing relevance. To achieve this result, you should specify
MATCH()
[934]
twice: once in the
SELECT
list and once in the
WHERE
clause. This causes no additional overhead,
because the MySQL optimizer notices that the two
MATCH()
[934]
calls are identical and invokes the
full-text search code only once.
mysql>
SELECT id, body, MATCH (title,body) AGAINST
->
('Security implications of running MySQL as root') AS score
->
FROM articles WHERE MATCH (title,body) AGAINST
->
('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)
The MySQL
FULLTEXT
implementation regards any sequence of true word characters (letters, digits,
and underscores) as a word. That sequence may also contain apostrophes (“
'
”), but not more than one
in a row. This means that
aaa'bbb
is regarded as one word, but
aaa''bbb
is regarded as two words.
Apostrophes at the beginning or the end of a word are stripped by the
FULLTEXT
parser;
'aaa'bbb'
would be parsed as
aaa'bbb
.
The
FULLTEXT
parser determines where words start and end by looking for certain delimiter
characters; for example, “
” (space), “
,
” (comma), and “
.
” (period). If words are not separated by
delimiters (as in, for example, Chinese), the
FULLTEXT
parser cannot determine where a word begins
or ends. To be able to add words or other indexed terms in such languages to a
FULLTEXT
index, you
must preprocess them so that they are separated by some arbitrary delimiter such as “
"
”.
Some words are ignored in full-text searches:
• Any word that is too short is ignored. The default minimum length of words that are found by full-text
searches is four characters.
• Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so
common that it is considered to have zero semantic value. There is a built-in stopword list, but it can
be overwritten by a user-defined list.
The default stopword list is given in
Section 12.9.4, “Full-Text Stopwords”
. The default minimum word
length and stopword list can be changed as described in
Section 12.9.6, “Fine-Tuning MySQL Full-Text
Search”
.
Every correct word in the collection and in the query is weighted according to its significance in the
collection or query. Consequently, a word that is present in many documents has a lower weight
(and may even have a zero weight), because it has lower semantic value in this particular collection.
Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to
compute the relevance of the row.
Such a technique works best with large collections (in fact, it was carefully tuned this way). For very
small tables, word distribution does not adequately reflect their semantic value, and this model may
sometimes produce bizarre results. For example, although the word “MySQL” is present in every row of
the
articles
table shown earlier, a search for the word produces no results:
mysql>
SELECT * FROM articles
->
WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...