Characters are objects that represent human-readable characters
such as letters and digits. More precisely, a character
represents a Unicode scalar value. Each character has an integer value
in the range 0
to #x10FFFF
(excluding the range #xD800
to #xDFFF
used for Surrogate Code Points).
Note: Unicode distinguishes between glyphs, which are printed for humans to read, and characters, which are abstract entities that map to glyphs (sometimes in a way that’s sensitive to surrounding characters). Furthermore, different sequences of scalar values sometimes correspond to the same character. The relationships among scalar, characters, and glyphs are subtle and complex.
Despite this complexity, most things that a literate human would call a “character” can be represented by a single Unicode scalar value (although several sequences of Unicode scalar values may represent that same character). For example, Roman letters, Cyrillic letters, Hebrew consonants, and most Chinese characters fall into this category.
Unicode scalar values exclude the range
#xD800
to#xDFFF
, which are part of the range of Unicode code points. However, the Unicode code points in this range, the so-called surrogates, are an artifact of the UTF-16 encoding, and can only appear in specific Unicode encodings, and even then only in pairs that encode scalar values. Consequently, all characters represent code points, but the surrogate code points do not have representations as characters.
A Unicode code point - normally a Unicode scalar value, but could be a surrogate. This is implemented using a 32-bit
int
. When an object is needed (i.e. the boxed representation), it is implemented an instance ofgnu.text.Char
.
A
character
or the specical#!eof
value (used to indicate end-of-file when reading from a port). This is implemented using a 32-bitint
, where the value -1 indicates end-of-file. When an object is needed, it is implemented an instance ofgnu.text.Char
or the special#!eof
object.
A UTF-16 code unit. Same as Java primitive
char
type. Considered to be a sub-type ofcharacter
. When an object is needed, it is implemented as an instance ofjava.lang.Character
. Note the unfortunate inconsistency (for historical reasons) ofchar
boxed asCharacter
vscharacter
boxed asChar
.
Characters are written using the notation
#\
character
(which stands for the given character
;
#\x
hex-scalar-value
(the character whose scalar value
is the given hex integer);
or #\
character-name
(a character with a given name):
character
::=
#\
any-character
| #\
character-name
| #\x
hex-scalar-value
| #\X
hex-scalar-value
The following character-name
forms are recognized:
#\alarm
#\x0007
- the alarm (bell) character
#\backspace
#\x0008
#\delete
#\del
#\rubout
#\x007f
- the delete or rubout character
#\escape
#\esc
#\x001b
#\newline
#\linefeed
#\x001a
- the linefeed character
#\null
#\nul
#\x0000
- the null character
#\page
#\000c
- the formfeed character
#\return
#\000d
- the carriage return character
#\space
#\x0020
- the preferred way to write a space
#\tab
#\x0009
- the tab character
#\vtab
#\x000b
- the vertical tabulation character
#\ignorable-char
A special character
value, but it is not a Unicode code point.
It is a special value returned when an index refers to the second
char
(code point) of a surrogate pair, and which should be ignored.
(When writing a character
to a string or file,
it will be written as one or two char
values.
The exception is #\ignorable-char
, for which zero
char
values are written.)
Return
#t
ifobj
is a character,#f
otherwise. (Theobj
can be any character, not just a 16-bitchar
.)
sv
should be a Unicode scalar value, i.e., a non–negative exact integer object in[0, #xD7FF] union [#xE000, #x10FFFF]
. (Kawa also allows values in the surrogate range.)Given a character,
char->integer
returns its Unicode scalar value as an exact integer object. For a Unicode scalar valuesv
,integer->char
returns its associated character.(integer->char 32) ⇒ #\space (char->integer (integer->char 5000)) ⇒ 5000 (integer->char #\xD800) ⇒ throws ClassCastExceptionPerformance note: A call to
char->integer
is compiled as casting the argument to acharacter
, and then re-interpreting that value as anint
. A call tointeger->char
is compiled as casting the argument to anint
, and then re-interpreting that value as ancharacter
. If the argument is the right type, no code is emitted: the value is just re-interpreted as the result type.
Procedure: char=?
char
1
char
2
char
3
…
Procedure: char<?
char
1
char
2
char
3
…
Procedure: char>?
char
1
char
2
char
3
…
Procedure: char<=?
char
1
char
2
char
3
…
Procedure: char>=?
char
1
char
2
char
3
…
These procedures impose a total ordering on the set of characters according to their Unicode scalar values.
(char<? #\z #\ß) ⇒ #t (char<? #\z #\Z) ⇒ #fPerformance note: This is compiled as if converting each argument using
char->integer
(which requires no code) and the using the corresponingint
comparison.
This procedure returns the numeric value (0 to 9) of its argument if it is a numeric digit (that is, if
char-numeric?
returns#t
), or#f
on any other character.(digit-value #\3) ⇒ 3 (digit-value #\x0664) ⇒ 4 (digit-value #\x0AE6) ⇒ 0 (digit-value #\x0EA6) ⇒ #f