Next: Emacs API, Previous: Examples, Up: GNU Libidn [Contents][Index]
GNU Libidn (idn) – Internationalized Domain Names command line tool
idn
allows internationalized string preparation
(‘stringprep’), encoding and decoding of punycode data, and IDNA
ToASCII/ToUnicode operations to be performed on the command line.
If strings are specified on the command line, they are used as input
and the computed output is printed to standard output stdout
.
If no strings are specified on the command line, the program read
data, line by line, from the standard input stdin
, and print
the computed output to standard output. What processing is performed
(e.g., ToASCII, or Punycode encode) is indicated by options. If any
errors are encountered, the execution of the applications is aborted.
All strings are expected to be encoded in the preferred charset used
by your locale. Use --debug
to find out what this charset is.
You can override the charset used by setting environment variable
CHARSET
.
To process a string that starts with -
, for example
-foo
, use --
to signal the end of parameters, as in
idn --quiet -a -- -foo
.
idn
recognizes these commands:
-h, --help Print help and exit -V, --version Print version and exit -s, --stringprep Prepare string according to nameprep profile -d, --punycode-decode Decode Punycode -e, --punycode-encode Encode Punycode -a, --idna-to-ascii Convert to ACE according to IDNA (default mode) -u, --idna-to-unicode Convert from ACE according to IDNA --allow-unassigned Toggle IDNA AllowUnassigned flag (default off) --usestd3asciirules Toggle IDNA UseSTD3ASCIIRules flag (default off) --no-tld Don't check string for TLD specific rules -n, --nfkc Normalize string according to Unicode v3.2 NFKC -p, --profile=STRING Use specified stringprep profile instead --debug Print debugging information --quiet Silent operation
The CHARSET environment variable can be used to override what character set to be used for decoding incoming data (i.e., on the command line or on the standard input stream), and to encode data to the standard output. If your system is set up correctly, however, the application will guess which character set is used automatically. Example usage:
$ CHARSET=ISO-8859-1 idn --punycode-encode ...
Standard usage, reading input from standard input. The parameter
--quiet
disable printing copyright, license and usage
instructions.
jas@latte:~$ idn --quiet räksmörgås.se xn--rksmrgs-5wao1o.se jas@latte:~$
Reading input from command line:
jas@latte:~$ idn --quiet räksmörgås.se blåbærgrød.no xn--rksmrgs-5wao1o.se xn--blbrgrd-fxak7p.no jas@latte:~$
Accessing a specific StringPrep profile directly:
jas@latte:~$ idn --quiet --profile=SASLprep --stringprep teßtª teßta jas@latte:~$
Getting character data encoded right, and making sure Libidn use the
same encoding, can be difficult. The reason for this is that most
systems encode character data in more than one character encoding,
i.e., using UTF-8
together with ISO-8859-1
or
ISO-2022-JP
. This problem is likely to continue to exist until
only one character encoding come out as the evolutionary winner, or
(more likely, at least to some extents) forever.
The first step to troubleshooting character encoding problems with Libidn is to use the ‘--debug’ parameter to find out which character set encoding ‘idn’ believe your locale uses.
jas@latte:~$ idn --debug --quiet "" system locale uses charset `UTF-8'. jas@latte:~$
If it prints ANSI_X3.4-1968
(i.e., US-ASCII
), this
indicate you have not configured your locale properly. To configure
the locale, you can, for example, use ‘LANG=sv_SE.UTF-8; export
LANG’ at a /bin/sh
prompt, to set up your locale for a Swedish
environment using UTF-8
as the encoding.
Sometimes ‘idn’ appear to be unable to translate from your system
locale into UTF-8
(which is used internally), and you get an
error like the following:
jas@latte:~$ idn --quiet foo idn: could not convert from ISO-8859-1 to UTF-8. jas@latte:~$
The simplest explanation is that you haven’t installed the ‘iconv’ conversion tools. You can find it as a standalone library in GNU Libiconv (https://www.gnu.org/software/libiconv/). On many GNU/Linux systems, this library is part of the system, but you may have to install additional packages (e.g., ‘glibc-locale’ for Debian) to be able to use it.
Another explanation is that the error is correct and you are feeding
‘idn’ invalid data. This can happen inadvertently if you are not
careful with the character set encoding you use. For example, if your
shell run in a ISO-8859-1
environment, and you invoke
‘idn’ with the ‘CHARSET’ environment variable as follows,
you will feed it ISO-8859-1
characters but force it to believe
they are UTF-8
. Naturally this will lead to an error, unless
the byte sequences happen to be valid UTF-8
. Note that even if
you don’t get an error, the output may be incorrect in this situation,
because ISO-8859-1
and UTF-8
does not in general encode
the same characters as the same byte sequences.
jas@latte:~$ idn --quiet --debug "" system locale uses charset `ISO-8859-1'. jas@latte:~$ CHARSET=UTF-8 idn --quiet --debug räksmörgås system locale uses charset `UTF-8'. input[0] = U+0072 input[1] = U+4af3 input[2] = U+006d input[3] = U+1b29e5 input[4] = U+0073 output[0] = U+0078 output[1] = U+006e output[2] = U+002d output[3] = U+002d output[4] = U+0072 output[5] = U+006d output[6] = U+0073 output[7] = U+002d output[8] = U+0068 output[9] = U+0069 output[10] = U+0036 output[11] = U+0064 output[12] = U+0035 output[13] = U+0039 output[14] = U+0037 output[15] = U+0035 output[16] = U+0035 output[17] = U+0032 output[18] = U+0061 xn--rms-hi6d597552a jas@latte:~$
The sense moral here is to forget about ‘CHARSET’ (configure your locales properly instead) unless you know what you are doing, and if you want to use it, do it carefully, after verifying with ‘--debug’ that you get the desired results.
Next: Emacs API, Previous: Examples, Up: GNU Libidn [Contents][Index]