An Emacs character set, or charset, is a set of characters
in which each character is assigned a numeric code point. (The
Unicode Standard calls this a coded character set.) Each Emacs
charset has a name which is a symbol. A single character can belong
to any number of different character sets, but it will generally have
a different code point in each charset. Examples of character sets
include ascii
, iso-8859-1
, greek-iso8859-7
, and
windows-1255
. The code point assigned to a character in a
charset is usually different from its code point used in Emacs buffers
and strings.
Emacs defines several special character sets. The character set
unicode
includes all the characters whose Emacs code points are
in the range 0..#x10FFFF
. The character set emacs
includes all ASCII and non-ASCII characters.
Finally, the eight-bit
charset includes the 8-bit raw bytes;
Emacs uses it to represent raw bytes encountered in text.
Returns t
if object is a symbol that names a character set,
nil
otherwise.
The value is a list of all defined character set names.
This function returns a list of all defined character sets ordered by
their priority. If highestp is non-nil
, the function
returns a single character set of the highest priority.
This function makes charsets the highest priority character sets.
This function returns the name of the character set of highest
priority that character belongs to. ASCII characters
are an exception: for them, this function always returns ascii
.
If restriction is non-nil
, it should be a list of
charsets to search. Alternatively, it can be a coding system, in
which case the returned charset must be supported by that coding
system (see Coding Systems).
This function returns the property list of the character set charset. Although charset is a symbol, this is not the same as the property list of that symbol. Charset properties include important information about the charset, such as its documentation string, short name, etc.
This function sets the propname property of charset to the given value.
This function returns the value of charsets property propname.
This command displays a list of characters in the character set charset.
Emacs can convert between its internal representation of a character and the character’s codepoint in a specific charset. The following two functions support these conversions.
This function decodes a character that is assigned a code-point
in charset, to the corresponding Emacs character, and returns
it. If charset doesn’t contain a character of that code point,
the value is nil
.
For backward compatibility, if code-point doesn’t fit in a Lisp
fixnum (see most-positive-fixnum), it can be
specified as a cons cell (high . low)
, where
low are the lower 16 bits of the value and high are the
high 16 bits. This usage is obsolescent.
This function returns the code point assigned to the character
char in charset. If
charset doesn’t have a codepoint for char, the value is
nil
.
The following function comes in handy for applying a certain function to all or part of the characters in a charset:
Call function for characters in charset. function
is called with two arguments. The first one is a cons cell
(from . to)
, where from and to
indicate a range of characters contained in charset. The second
argument passed to function is arg, or nil
if
arg is omitted.
By default, the range of codepoints passed to function includes
all the characters in charset, but optional arguments
from-code and to-code limit that to the range of
characters between these two codepoints of charset. If either
of them is nil
, it defaults to the first or last codepoint of
charset, respectively. Note that from-code and
to-code are charset’s codepoints, not the Emacs codes of
characters; by contrast, the values from and to in the
cons cell passed to function are Emacs character codes.
Those Emacs character codes are either Unicode code points, or Emacs
internal code points that extend Unicode and are beyond the Unicode
range of characters 0..#x10FFFF
(see Text Representations).
The latter happens rarely, with legacy CJK charsets for codepoints of
charset which specify characters not yet unified with Unicode.