Next: Choice of in-memory representation of strings, Previous: Unicode and Internationalization, Up: Introduction [Contents][Index]
A locale is a set of cultural conventions. According to POSIX, for a program, at any moment, there is one locale being designated as the “current locale”. (Actually, POSIX supports also one locale per thread, but this feature is not yet universally implemented and not widely used.) The locale is partitioned into several aspects, called the “categories” of the locale. The main various aspects are:
LC_CTYPE
category.
LC_COLLATE
category.
LC_MESSAGES
category.
LC_NUMERIC
category.
LC_MONETARY
category.
LC_TIME
category.
In particular, the LC_CTYPE
category of the current locale determines
the character encoding. This is the encoding of ‘char *’ strings.
We also call it the “locale encoding”. GNU libunistring has a function,
locale_charset
, that returns a standardized (platform independent)
name for this encoding.
All locale encodings used on glibc systems are essentially ASCII compatible: Most graphic ASCII characters have the same representation, as a single byte, in that encoding as in ASCII.
Among the possible locale encodings are UTF-8 and GB18030. Both allow to represent any Unicode character as a sequence of bytes. UTF-8 is used in most of the world, whereas GB18030 is used in the People’s Republic of China, because it is backward compatible with the GB2312 encoding that was used in this country earlier.
The legacy locale encodings, ISO-8859-15 (which supplanted ISO-8859-1 in most of Europe), ISO-8859-2, KOI8-R, EUC-JP, etc., are still in use in some places, though.
UTF-16 and UTF-32 are not used as locale encodings, because they are not ASCII compatible.
Next: Choice of in-memory representation of strings, Previous: Unicode and Internationalization, Up: Introduction [Contents][Index]