Next: Normalizing comparisons, Previous: Composition of Unicode characters, Up: Normalization forms (composition and decomposition) <uninorm.h>
[Contents][Index]
The Unicode standard defines four normalization forms for Unicode strings. The following type is used to denote a normalization form.
An object of type uninorm_t
denotes a Unicode normalization form.
This is a scalar type; its values can be compared with ==
.
The following constants denote the four normalization forms.
Denotes Normalization form D: canonical decomposition.
Normalization form C: canonical decomposition, then canonical composition.
Normalization form KD: compatibility decomposition.
Normalization form KC: compatibility decomposition, then canonical composition.
The following functions operate on uninorm_t
objects.
Tests whether the normalization form nf does compatibility decomposition.
Tests whether the normalization form nf includes canonical composition.
Returns the decomposing variant of the normalization form nf. This maps NFC,NFD → NFD and NFKC,NFKD → NFKD.
The following functions apply a Unicode normalization form to a Unicode string.
Returns the specified normalization form of a string.
The resultbuf and lengthp arguments are as described in chapter Conventions.