The char32_t type (GNU Gnulib)

Next: The mbchar_t type, Previous: The wchar_t type, Up: Characters [Contents][Index]

16.2.3 The `char32_t` type ¶

The ISO C and POSIX standard creators then introduced the char32_t type. In ISO C 11, it was conceptually a “32-bit wide character” type. In ISO C 23, its semantics has been further specified: A char32_t value is a Unicode code point.

Thus, the char32_t type is not affected the problems that plague the wchar_t type.

The char32_t type and its API are defined in the <uchar.h> header file.

ISO C and POSIX specify only the basic functions for the char32_t type, namely conversion of a single character (mbrtoc32 and c32rtomb). For convenience, Gnulib adds API for classification and case conversion of characters.

GNU libunistring can also be used on char32_t values. Since char32_t is the same as uint32_t, all u32_* functions of GNU libunistring are applicable to arrays of char32_t values.

On glibc systems, use of the 32-bit wide strings (char32_t[]) is exactly as efficient as the use of the older wide strings (wchar_t[]). This is possible because on glibc, wchar_t values already always were 32-bit and Unicode code points. mbrtoc32 is just an alias of mbrtowc. The Gnulib *c32* functions are optimized so that on glibc systems they immediately redirect to the corresponding *wc* functions.

Gnulib implements the ISO C 23 semantics of char32_t when you import the ‘uchar-h-c23’ module. Without this module, it implements only the ISO C 11 semantics; the effect is that on some platforms (macOS, FreeBSD, NetBSD, Solaris) a char32_t value is the same as a wchar_t value, not a Unicode code point. Thus, when you want to pass char32_t values to GNU libunistring or to some Unicode centric Gnulib functions, you need the ‘uchar-h-c23’ module in order to do so without portability problems.

16.2.3 The char32_t type ¶

16.2.3 The `char32_t` type ¶