Java API (GNU Libidn 1.42)

12 Java API

Libidn has been ported to the Java programming language, and as a consequence most of the API is available to native Java applications. This section contain notes on this support, complete documentation is pending.

The Java library, if Libidn has been built with Java support (see Downloading and Installing), will be placed in java/libidn-1.42.jar. The source code is below java/ in Maven directory layout, and there is a Maven pom.xml build script as well. Source code files are in java/src/main/java/gnu/inet/encoding/.

Overview
Miscellaneous Programs
Possible Problems
A Note on Java and Unicode

12.1 Overview

This package provides a Java implementation of the Internationalized Domain Names in Applications (IDNA) standard. It is written entirely in Java and does not require any additional libraries to be set up.

The gnu.inet.encoding.IDNA class offers two public functions, toASCII and toUnicode which can be used as follows:

gnu.inet.encoding.IDNA.toASCII("blöds.züg");
gnu.inet.encoding.IDNA.toUnicode("xn--blds-6qa.xn--zg-xka");

12.2 Miscellaneous Programs

The java/src/util/java/ directory contains several programs that are related to the Java part of GNU Libidn, but that don’t need to be included in the main source tree or the JAR file.

GenerateRFC3454
GenerateNFKC
TestIDNA
TestNFKC

12.2.1 GenerateRFC3454

This program parses RFC3454 and creates the RFC3454.java program that is required during the StringPrep phase.

The RFC can be found at various locations, for example at http://www.ietf.org/rfc/rfc3454.txt.

Invoke the program as follows:

$ java GenerateRFC3454
Creating RFC3454.java... Ok.

12.2.2 GenerateNFKC

The GenerateNFKC program parses the Unicode character database file and generates all the tables required for NFKC. This program requires the two files UnicodeData.txt and CompositionExclusions.txt of version 3.2 of the Unicode files. Note that RFC3454 (Stringprep) defines that Unicode version 3.2 is to be used, not the latest version.

The Unicode data files can be found at http://www.unicode.org/Public/.

Invoke the program as follows:

$ java GenerateNFKC
Creating CombiningClass.java... Ok.
Creating DecompositionKeys.java... Ok.
Creating DecompositionMappings.java... Ok.
Creating Composition.java... Ok.

12.2.3 TestIDNA

The TestIDNA program allows to test the IDNA implementation manually or against Simon Josefsson’s test vectors.

The test vectors can be found at the Libidn homepage, https://www.gnu.org/software/libidn/.

To test the transformation manually, use:

$ java -cp .:/usr/share/java/libidn.jar TestIDNA -a <string to test>
Input: <string to test>
Output: <toASCII(string to test)>
$ java -cp .:/usr/share/java/libidn.jar TestIDNA -u <string to test>
Input: <string to test>
Output: <toUnicode(string to test)>

To test against draft-josefsson-idn-test-vectors.html, use:

$ java -cp .:/usr/share/java/libidn/libidn.jar TestIDNA -t
No errors detected!

12.2.4 TestNFKC

The TestNFKC program allows to test the NFKC implementation manually or against the NormalizationTest.txt file from the Unicode data files.

To test the normalization manually, use:

$ java -cp .:/usr/share/java/libidn.jar TestNFKC <string to test>
Input: <string to test>
Output: <nfkc version of the string to test>

To test against NormalizationTest.txt:

$ java -cp .:/usr/share/java/libidn.jar TestNFKC
No errors detected!

12.3 Possible Problems

Beware of Bugs: This Java API needs a lot more testing, especially with "exotic" character sets. While it works for me, it may not work for you.

Encoding of your Java sources: If you are using non-ASCII characters in your Java source code, make sure javac compiles your programs with the correct encoding. If necessary specify the encoding using the -encoding parameter.

Java Unicode handling: Java 1.4 only handles 16-bit Unicode code points (i.e. characters in the Basic Multilingual Plane), this implementation therefore ignores all references to so-called Supplementary Characters (U+10000 to U+10FFFF). Starting from Java 1.5, these characters will also be supported by Java, but this will require changes to this library. See also the next section.

12.4 A Note on Java and Unicode

This library uses Java’s built-in ’char’ datatype. Up to Java 1.4, this datatype only supports 16-bit Unicode code points, also called the Basic Multilingual Plane. For this reason, this library doesn’t work for Supplementary Characters (i.e. characters from U+10000 to U+10FFFF). All references to such characters are silently ignored.

Starting from Java 1.5, also Supplementary Characters will be supported. However, this will require changes in the present version of the library. Java 1.5 is currently in beta status.

For more information refer to the documentation of java.lang.Character in the JDK API.