The Kawa Scheme language: Bytevectors

14.5 Bytevectors

Bytevectors represent blocks of binary data. They are fixed-length sequences of bytes, where a byte is an exact integer in the range [0, 255]. A bytevector is typically more space-efficient than a vector containing the same values.

The length of a bytevector is the number of elements that it contains. This number is a non-negative integer that is fixed when the bytevector is created. The valid indexes of a bytevector are the exact non-negative integers less than the length of the bytevector, starting at index zero as with vectors.

The bytevector type is equivalent to the u8vector uniform vector type, but is specified by the R7RS standard.

Bytevectors are written using the notation #u8(byte . . . ). For example, a bytevector of length 3 containing the byte 0 in element 0, the byte 10 in element 1, and the byte 5 in element 2 can be written as following:

#u8(0 10 5)

Bytevector constants are self-evaluating, so they do not need to be quoted in programs.

Type: bytevector: The type of bytevector objects.

Constructor: bytevector byte …

Return a newly allocated bytevector whose elements contain the given arguments. Analogous to vector.

(bytevector 1 3 5 1 3 5)  ⇒  #u8(1 3 5 1 3 5)
(bytevector)  ⇒  #u8()

Procedure: bytevector? obj: Return #t if obj is a bytevector, #f otherwise.

Procedure: make-bytevector k

Procedure: make-bytevector k byte

The make-bytevector procedure returns a newly allocated bytevector of length k. If byte is given, then all elements of the bytevector are initialized to byte, otherwise the contents of each element are unspecified.

(make-bytevector 2 12) ⇒ #u8(12 12)

Procedure: bytevector-length bytevector: Returns the length of bytevector in bytes as an exact integer.

Procedure: bytevector-u8-ref bytevector k

It is an error if k is not a valid index of bytevector. Returns the kth byte of bytevector.

(bytevector-u8-ref ’#u8(1 1 2 3 5 8 13 21) 5)
  ⇒ 8

Procedure: bytevector-u8-set! bytevector k byte

It is an error if k is not a valid index of bytevector. Stores byte as the kth byte of bytevector.

(let ((bv (bytevector 1 2 3 4)
  (bytevector-u8-set! bv 1 3)
  bv)
  ⇒ #u8(1 3 3 4)

Procedure: bytevector-copy bytevector [start [end]]

Returns a newly allocated bytevector containing the bytes in bytevector between start and end.

(define a #u8(1 2 3 4 5))
(bytevector-copy a 2 4))
    ⇒ #u8(3 4)

Procedure: bytevector-copy! to at from [start [end]]

Copies the bytes of bytevectorfrom between start and end to bytevector to, starting at at. The order in which bytes are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary bytevector and then into the destination. This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.

It is an error if at is less than zero or greater than the length of to. It is also an error if (- (bytevector-length to) at) is less than (- end start).

(define a (bytevector 1 2 3 4 5))
(define b (bytevector 10 20 30 40 50))
(bytevector-copy! b 1 a 0 2)
b        ⇒ #u8(10 1 2 40 50)

Procedure: bytevector-append bytevector...

Returns a newly allocated bytevector whose elements are the concatenation of the elements in the given bytevectors.

(bytevector-append #u8(0 1 2) #u8(3 4 5))
        ⇒  #u8(0 1 2 3 4 5)

14.5.1 Converting to or from strings

Procedure: utf8->string bytevector [start [end]]

This procedure decodes the bytes of a bytevector between start and end, interpreting as a UTF-8-encoded string, and returns the corresponding string. It is an error for bytevector to contain invalid UTF-8 byte sequences.

(utf8->string #u8(#x41))  ⇒ "A"

Procedure: utf16->string bytevector [start [end]]

Procedure: utf16be->string bytevector [start [end]]

Procedure: utf16le->string bytevector [start [end]]

These procedures interpret their <var>bytevector</var> argument as a UTF-16 encoding of a sequence of characters, and return an istring containing that sequence.

The bytevector subrange given to utf16->string may begin with a byte order mark (BOM); if so, that BOM determines whether the rest of the subrange is to be interpreted as big-endian or little-endian; in either case, the BOM will not become a character in the returned string. If the subrange does not begin with a BOM, it is decoded using the same implementation-dependent endianness used by string->utf16.

The utf16be->string and utf16le->string procedures interpret their inputs as big-endian or little-endian, respectively. If a BOM is present, it is treated as a normal character and will become part of the result.

It is an error if (- end start) is odd, or if the bytevector subrange contains invalid UTF-16 byte sequences.

Procedure: string->utf8 string [start [end]]

This procedure encodes the characters of a string between start and end and returns the corresponding bytevector, in UTF-8 encoding.

(string->utf8 "λ")     ⇒ " #u8(#xCE #xBB)

Procedure: string->utf16 string [start [end]]

Procedure: string->utf16be string [start [end]]

Procedure: string->utf16le string [start [end]]

These procedures return a newly allocated (unless empty) bytevector containing a UTF-16 encoding of the given substring.

The bytevectors returned by string->utf16be and string->utf16le do not contain a byte-order mark (BOM); string->utf16be> returns a big-endian encoding, while string->utf16le returns a little-endian encoding.

The bytevectors returned by string->utf16 begin with a BOM that declares an implementation-dependent endianness, and the bytevector elements following that BOM encode the given substring using that endianness.

Rationale: These procedures are consistent with the Unicode standard. Unicode suggests UTF-16 should default to big-endian, but Microsoft prefers little-endian.