char32_t
type ¶The ISO C and POSIX standard creators then introduced the
char32_t
type. In ISO C 11, it was conceptually a “32-bit wide
character” type. In ISO C 23, its semantics has been further
specified: A char32_t
value is a Unicode code point.
Thus, the char32_t
type is not affected the problems that plague
the wchar_t
type.
The char32_t
type and its API are defined in the <uchar.h>
header file.
ISO C and POSIX specify only the basic functions for the char32_t
type, namely conversion of a single character (mbrtoc32
and
c32rtomb
). For convenience, Gnulib adds API for classification
and case conversion of characters.
GNU libunistring can also be used on char32_t
values. Since
char32_t
is the same as uint32_t
, all u32_*
functions of GNU libunistring are applicable to arrays of
char32_t
values.
On glibc systems, use of the 32-bit wide strings (char32_t[]
) is
exactly as efficient as the use of the older wide strings
(wchar_t[]
). This is possible because on glibc, wchar_t
values already always were 32-bit and Unicode code points.
mbrtoc32
is just an alias of mbrtowc
. The Gnulib
*c32*
functions are optimized so that on glibc systems they
immediately redirect to the corresponding *wc*
functions.
Gnulib implements the ISO C 23 semantics of char32_t
when you
import the ‘uchar-h-c23’ module. Without this module, it implements
only the ISO C 11 semantics; the effect is that on some platforms
(macOS, FreeBSD, NetBSD, Solaris) a char32_t
value is the same
as a wchar_t
value, not a Unicode code point. Thus, when you
want to pass char32_t
values to GNU libunistring or to some Unicode
centric Gnulib functions, you need the ‘uchar-h-c23’ module in order
to do so without portability problems.