Unicode strings (GNU libunistring)

libunistring supports Unicode strings in three representations:

UTF-8 strings, through the type ‘uint8_t *’. The units are bytes (uint8_t).
UTF-16 strings, through the type ‘uint16_t *’, The units are 16-bit memory words (uint16_t).
UTF-32 strings, through the type ‘uint32_t *’. The units are 32-bit memory words (uint32_t).

As with C strings, there are two variants:

Unicode strings with a terminating NUL character are represented as a pointer to the first unit of the string. There is a unit containing a 0 value at the end. It is considered part of the string for all memory allocation purposes, but is not considered part of the string for all other logical purposes.
Unicode strings where embedded NUL characters are allowed. These are represented by a pointer to the first unit and the number of units (not bytes!) of the string. In this setting, there is no trailing zero-valued unit used as “end marker”.