unidecode

This module is based on Python's Unidecode module by Tomaz Solc, which in turn is based on the Text::Unidecode Perl module by Sean M. Burke (http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm ).

It provides a single proc that does Unicode to ASCII transliterations: It finds the sequence of ASCII characters that is the closest approximation to the Unicode string.

For example, the closest to string "Äußerst" in ASCII is "Ausserst". Some information is lost in this transformation, of course, since several Unicode strings can be transformed in the same ASCII representation. So this is a strictly one-way transformation. However a human reader will probably still be able to guess what original string was meant from the context.

This module needs the data file "unidecode.dat" to work: This file is embedded as a resource into your application by default. But you an also define the symbol --define:noUnidecodeTable during compile time and use the loadUnidecodeTable proc to initialize this module.

Imports

unicode, strutils

Procs

proc loadUnidecodeTable(datafile = "unidecode.dat") {...}{.raises: [], tags: [].}
loads the datafile that unidecode to work. This is only required if the module was compiled with the --define:noUnidecodeTable switch. This needs to be called by the main thread before any thread can make a call to unidecode. Source Edit
proc unidecode(s: string): string {...}{.raises: [], tags: [].}

Finds the sequence of ASCII characters that is the closest approximation to the UTF-8 string s.

Example:

unidecode("北京")

Results in: "Bei Jing"

Source Edit

© 2006–2021 Andreas Rumpf
Licensed under the MIT License.
https://nim-lang.org/docs/unidecode.html