Guile Reference Manual: Representing Strings as Bytes

Warning: This is the manual of the legacy Guile 2.0 series. You may want to read the manual of the current stable series instead.

6.6.5.13 Representing Strings as Bytes

Out in the cold world outside of Guile, not all strings are treated in the same way. Out there there are only bytes, and there are many ways of representing a strings (sequences of characters) as binary data (sequences of bytes).

As a user, usually you don’t have to think about this very much. When you type on your keyboard, your system encodes your keystrokes as bytes according to the locale that you have configured on your computer. Guile uses the locale to decode those bytes back into characters – hopefully the same characters that you typed in.

All is not so clear when dealing with a system with multiple users, such as a web server. Your web server might get a request from one user for data encoded in the ISO-8859-1 character set, and then another request from a different user for UTF-8 data.

Guile provides an iconv module for converting between strings and sequences of bytes. See Bytevectors, for more on how Guile represents raw byte sequences. This module gets its name from the common UNIX command of the same name.

Note that often it is sufficient to just read and write strings from ports instead of using these functions. To do this, specify the port encoding using set-port-encoding!. See Ports, for more on ports and character encodings.

Unlike the rest of the procedures in this section, you have to load the iconv module before having access to these procedures:

(use-modules (ice-9 iconv))

Scheme Procedure: string->bytevector string encoding [conversion-strategy]

Encode string as a sequence of bytes.

The string will be encoded in the character set specified by the encoding string. If the string has characters that cannot be represented in the encoding, by default this procedure raises an encoding-error. Pass a conversion-strategy argument to specify other behaviors.

The return value is a bytevector. See Bytevectors, for more on bytevectors. See Ports, for more on character encodings and conversion strategies.

Scheme Procedure: bytevector->string bytevector encoding [conversion-strategy]

Decode bytevector into a string.

The bytes will be decoded from the character set by the encoding string. If the bytes do not form a valid encoding, by default this procedure raises an decoding-error. As with string->bytevector, pass the optional conversion-strategy argument to modify this behavior. See Ports, for more on character encodings and conversion strategies.

Scheme Procedure: call-with-output-encoded-string encoding proc [conversion-strategy]: Like call-with-output-string, but instead of returning a string, returns a encoding of the string according to encoding, as a bytevector. This procedure can be more efficient than collecting a string and then converting it via string->bytevector.