F.43. unaccent
unaccent
is a text search dictionary that removes accents (diacritic signs) from lexemes. It's a filtering dictionary, which means its output is always passed to the next dictionary (if any), unlike the normal behavior of dictionaries. This allows accent-insensitive processing for full text search.
The current implementation of unaccent
cannot be used as a normalizing dictionary for the thesaurus
dictionary.
F.43.1. Configuration
An unaccent
dictionary accepts the following options:
-
RULES
is the base name of the file containing the list of translation rules. This file must be stored in$SHAREDIR/tsearch_data/
(where$SHAREDIR
means the PostgreSQL installation's shared-data directory). Its name must end in.rules
(which is not to be included in theRULES
parameter).
The rules file has the following format:
-
Each line represents a pair, consisting of a character with accent followed by a character without accent. The first is translated into the second. For example,
À A Á A Â A Ã A Ä A Å A Æ A
A more complete example, which is directly useful for most European languages, can be found in unaccent.rules
, which is installed in $SHAREDIR/tsearch_data/
when the unaccent
module is installed.
F.43.2. Usage
Installing the unaccent
extension creates a text search template unaccent
and a dictionary unaccent
based on it. The unaccent
dictionary has the default parameter setting RULES='unaccent'
, which makes it immediately usable with the standard unaccent.rules
file. If you wish, you can alter the parameter, for example
mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
or create new dictionaries based on the template.
To test the dictionary, you can try:
mydb=# select ts_lexize('unaccent','Hôtel'); ts_lexize ----------- {Hotel} (1 row)
Here is an example showing how to insert the unaccent
dictionary into a text search configuration:
mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); mydb=# ALTER TEXT SEARCH CONFIGURATION fr ALTER MAPPING FOR hword, hword_part, word WITH unaccent, french_stem; mydb=# select to_tsvector('fr','Hôtels de la Mer'); to_tsvector ------------------- 'hotel':1 'mer':4 (1 row) mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); ?column? ---------- t (1 row) mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); ts_headline ------------------------ <b>Hôtel</b> de la Mer (1 row)
F.43.3. Functions
The unaccent()
function removes accents (diacritic signs) from a given string. Basically, it's a wrapper around the unaccent
dictionary, but it can be used outside normal text search contexts.
unaccent([dictionary regdictionary, ] string text) returns text
If the dictionary
argument is omitted, the text search dictionary named unaccent
and appearing in the same schema as the unaccent()
function itself is used.
For example:
SELECT unaccent('unaccent', 'Hôtel'); SELECT unaccent('Hôtel');
© 1996–2019 The PostgreSQL Global Development Group
Licensed under the PostgreSQL License.
https://www.postgresql.org/docs/9.4/unaccent.html