Next: , Previous: , Up: Normalization forms (composition and decomposition) <uninorm.h>   [Contents][Index]


13.3 Normalization of strings

The Unicode standard defines four normalization forms for Unicode strings. The following type is used to denote a normalization form.

Type: uninorm_t

An object of type uninorm_t denotes a Unicode normalization form. This is a scalar type; its values can be compared with ==.

The following constants denote the four normalization forms.

Macro: uninorm_t UNINORM_NFD

Denotes Normalization form D: canonical decomposition.

Macro: uninorm_t UNINORM_NFC

Normalization form C: canonical decomposition, then canonical composition.

Macro: uninorm_t UNINORM_NFKD

Normalization form KD: compatibility decomposition.

Macro: uninorm_t UNINORM_NFKC

Normalization form KC: compatibility decomposition, then canonical composition.

The following functions operate on uninorm_t objects.

Function: bool uninorm_is_compat_decomposing (uninorm_t nf)

Tests whether the normalization form nf does compatibility decomposition.

Function: bool uninorm_is_composing (uninorm_t nf)

Tests whether the normalization form nf includes canonical composition.

Function: uninorm_t uninorm_decomposing_form (uninorm_t nf)

Returns the decomposing variant of the normalization form nf. This maps NFC,NFD → NFD and NFKC,NFKD → NFKD.

The following functions apply a Unicode normalization form to a Unicode string.

Function: uint8_t * u8_normalize (uninorm_t nf, const uint8_t *s, size_t n, uint8_t *resultbuf, size_t *lengthp)
Function: uint16_t * u16_normalize (uninorm_t nf, const uint16_t *s, size_t n, uint16_t *resultbuf, size_t *lengthp)
Function: uint32_t * u32_normalize (uninorm_t nf, const uint32_t *s, size_t n, uint32_t *resultbuf, size_t *lengthp)

Returns the specified normalization form of a string.

The resultbuf and lengthp arguments are as described in chapter Conventions.


Next: Normalizing comparisons, Previous: Composition of Unicode characters, Up: Normalization forms (composition and decomposition) <uninorm.h>   [Contents][Index]