It’s sometimes useful, or even necessary, to represent strings containing accented or other letters, which are outside of the US-ASCII set, as pure ASCII. That is, for instance:
perché ==> perche
This transliteration might be desirable for various reasons, mainly to use the string somewhere where only ASCII is supported (or desirable). Some folks call this process deaccent, as it’s commonly used to remove accents from words in order to make comparisons possible. In practice, accents are not necessarily the only problem, and you’ll want to handle things like:
straße ==> strasse
Tromsø ==> Tromso
There’s a CPAN module which can help here: Text::Unidecode by Sean M. Burke.
use utf8;
use Modern::Perl;
use Text::Unidecode;
for my $word(qw/Tromsø perché straße/) {
# ASCII representation
say unidecode($word);
}
This will print, as expected:
Tromso
perche
strasse
As you can see in the module documentation, it’s not meticulous, so it doesn’t always do a good job. However, Text::Unidecode
works nicely with Western European languages along with some others.