Warning: This is the manual of the legacy Guile 2.2 series. You may want to read the manual of the current stable series instead.

Previous: , Up: Strings   [Contents][Index]


6.6.5.15 String Internals

Guile stores each string in memory as a contiguous array of Unicode code points along with an associated set of attributes. If all of the code points of a string have an integer range between 0 and 255 inclusive, the code point array is stored as one byte per code point: it is stored as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the string has an integer value greater that 255, the code point array is stored as four bytes per code point: it is stored as a UTF-32 string.

Conversion between the one-byte-per-code-point and four-bytes-per-code-point representations happens automatically as necessary.

No API is provided to set the internal representation of strings; however, there are pair of procedures available to query it. These are debugging procedures. Using them in production code is discouraged, since the details of Guile’s internal representation of strings may change from release to release.

Scheme Procedure: string-bytes-per-char str
C Function: scm_string_bytes_per_char (str)

Return the number of bytes used to encode a Unicode code point in string str. The result is one or four.

Scheme Procedure: %string-dump str
C Function: scm_sys_string_dump (str)

Returns an association list containing debugging information for str. The association list has the following entries.

string

The string itself.

start

The start index of the string into its stringbuf

length

The length of the string

shared

If this string is a substring, it returns its parent string. Otherwise, it returns #f

read-only

#t if the string is read-only

stringbuf-chars

A new string containing this string’s stringbuf’s characters

stringbuf-length

The number of characters in this stringbuf

stringbuf-shared

#t if this stringbuf is shared

stringbuf-wide

#t if this stringbuf’s characters are stored in a 32-bit buffer, or #f if they are stored in an 8-bit buffer


Previous: , Up: Strings   [Contents][Index]