Next: String literals, Previous: Character sets, Up: Characters and text [Contents][Index]
Strings are sequences of characters. The length of a string is the number of characters that it contains, as an exact non-negative integer. The valid indices of a string are the exact non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.
Strings are implemented as a sequence of 16-bit char
values,
even though they’re semantically a sequence of 32-bit Unicode code points.
A character whose value is greater than #xffff
is represented using two surrogate characters.
The implementation allows for natural interoperability with Java APIs.
However it does make certain operations (indexing or counting based on
character counts) difficult to implement efficiently. Luckily one
rarely needs to index or count based on character counts;
alternatives are discussed below.
There are different kinds of strings:
string-ref
) is efficient (constant-time),
while indexing of other string implementations takes time proportional
to the index.
String literals are istrings, as are the return values of most of the procedures in this chapter.
An istring is an instance of the gnu.lists.IString
class.
string-set!
).
You can also change the mstring’s length by inserting
or removing characters (using string-append!
or string-replace!
).
An mstring is an instance of the gnu.lists.FString
class.
java.lang.CharSequence
interface
is also a string.
This includes standard Java java.lang.String
and java.lang.StringBuilder
objects.
Some of the procedures that operate on strings ignore the
difference between upper and lower case. The names of
the versions that ignore case end with “-ci
” (for “case
insensitive”).
Compatibility:
Many of the following procedures (for example string-append
)
return an immutable istring in Kawa,
but return a “freshly allocated” mutable string in
standard Scheme (include R7RS) as well as most Scheme implementations
(including previous versions of Kawa).
To get the “compatibility mode” versions of those procedures
(which return mstrings),
invoke Kawa with one the --r5rs
, --r6rs
, or --r7rs
options, or you can import
a standard library like (scheme base)
.
The type of string objects.
The underlying type is the interface java.lang.CharSequence
.
Immultable strings are gnu.lists.IString
or java.lang.String
,
while mutable strings are gnu.lists.FString
.
Return #t
if obj is a string, #f
otherwise.
Return #t
if obj is a istring (a immutable, constant-time-indexable string); #f
otherwise.
Return a string composed of the arguments. This is analogous to list.
Compatibility: The result is an istring, except in compatibility mode, when it is a new allocated mstring.
Return the number of characters in the given string as an exact integer object.
Performance note: If the string is not an istring,
the calling string-length
may take time proportional
to the length of the string,
because of the need to scan for surrogate pairs.
k must be a valid index of string. The string-ref
procedure returns character k of string using zero–origin
indexing.
Performance note: If the string is not an istring,
then calling string-ref
may take time proportional
to k because of the need to check for surrogate pairs.
An alternative is to use string-cursor-ref
.
If iterating through a string, use string-for-each
.
Is string the empty string?
Same result as (= (string-length string) 0)
but
executes in O(1) time.
Checks to see if every/any character in string satisfies pred,
proceeding from left (index start) to right (index end). These
procedures are short-circuiting: if pred returns false,
string-every
does not call pred on subsequent characters;
if pred returns true, string-any
does not call pred
on subsequent characters. Both procedures are “witness-generating”:
string-every
is given an empty interval (with start = end),
it returns #t
.
string-every
returns true for a non-empty interval
(with start < end), the returned true value is the one returned by the final call to the predicate on
(string-ref string (- end 1))
.
string-any
returns true, the returned true value is the one
returned by the predicate.
Note: The names of these procedures do not end with a question
mark. This indicates a general value is returned instead of a simple
boolean (#t
or #f
).
Constructs a string of size len by calling proc on each value from 0 (inclusive) to len (exclusive) to produce the corresponding element of the string. The procedure proc accepts an exact integer as its argument and returns a character. The order in which proc is called on those indexes is not specifified.
Rationale: Although string-unfold
is more general,
string-tabulate
is likely to run faster for the common special
case it implements.
This is a fundamental and powerful constructor for strings.
(
successor seed)
, (
successor2 seed)
, (
successor3 seed)
, ...
""
.
It is an error if base is anything other than a character or string.
(lambda (x) "")
.
It is an error for make-final to return anything other than
a character or string.
string-unfold-right
is the same as string-unfold
except the
results of mapper are assembled into the string in right-to-left order,
base is the optional rightmost portion of the constructed string, and
make-final produces the leftmost portion of the constructed string.
You can use it string-unfold
to convert a list to a string,
read a port into a string, reverse a string, copy a string, and so forth.
Examples:
(define (port->string p) (string-unfold eof-object? values (lambda (x) (read-char p)) (read-char p))) (define (list->string lis) (string-unfold null? car cdr lis)) (define (string-tabulate f size) (string-unfold (lambda (i) (= i size)) f add1 0))
To map f over a list lis, producing a string:
(string-unfold null? (compose f car) cdr lis)
Interested functional programmers may enjoy noting that string-fold-right
and string-unfold
are in some sense inverses.
That is, given operations knull?, kar, kdr,
kons, and knil satisfying
(kons (kar x) (kdr x)) = x and (knull? knil) = #t
then
(string-fold-right kons knil (string-unfold knull? kar kdr x)) = x
and
(string-unfold knull? kar kdr (string-fold-right kons knil string)) = string.
This combinator pattern is sometimes called an “anamorphism.”
string must be a string, and start and end must be exact integer objects satisfying:
0 <= start <= end <= (string-length string)
The substring
procedure returns a newly allocated string formed
from the characters of string beginning with index start
(inclusive) and ending with index end (exclusive).
string-take
returns an immutable string containing the
first nchars of string;
string-drop
returns a string containing all but the first nchars
of string.
string-take-right
returns a string containing the last nchars
of string; string-drop-right
returns a string containing all
but the last nchars of string.
(string-take "Pete Szilagyi" 6) ⇒ "Pete S" (string-drop "Pete Szilagyi" 6) ⇒ "zilagyi" (string-take-right "Beta rules" 5) ⇒ "rules" (string-drop-right "Beta rules" 5) ⇒ "Beta "
It is an error to take or drop more characters than are in the string:
(string-take "foo" 37) ⇒ error
Returns an istring of length len comprised of the characters
drawn from the given subrange of string,
padded on the left (right) by as many occurrences of the
character char as needed.
If string has more than len chars, it is truncated on the
left (right) to length len.
The char defaults to #\space
(string-pad "325" 5) ⇒ " 325" (string-pad "71325" 5) ⇒ "71325" (string-pad "8871325" 5) ⇒ "71325"
Returns an istring obtained from the given subrange of string
by skipping over all characters on the left / on the right / on both sides that satisfy the second argument pred:
pred defaults to char-whitespace?
.
(string-trim-both " The outlook wasn't brilliant, \n\r") ⇒ "The outlook wasn't brilliant,"
Return #t
if the strings are the same length and contain the same
characters in the same positions. Otherwise, the string=?
procedure returns #f
.
(string=? "Straße" "Strasse") ⇒ #f
These procedures return #t
if their arguments are (respectively):
monotonically increasing, monotonically decreasing,
monotonically non-decreasing, or monotonically nonincreasing.
These predicates are required to be transitive.
These procedures are the lexicographic extensions to strings of the
corresponding orderings on characters. For example, string<?
is
the lexicographic ordering on strings induced by the ordering
char<?
on characters. If two strings differ in length but are
the same up to the length of the shorter string, the shorter string is
considered to be lexicographically less than the longer string.
(string<? "z" "ß") ⇒ #t (string<? "z" "zz") ⇒ #t (string<? "z" "Z") ⇒ #f
These procedures are similar to string=?
, etc.,
but behave as if they applied string-foldcase
to their arguments
before invoking the corresponding procedures without -ci
.
(string-ci<? "z" "Z") ⇒ #f (string-ci=? "z" "Z") ⇒ #t (string-ci=? "Straße" "Strasse") ⇒ #t (string-ci=? "Straße" "STRASSE") ⇒ #t (string-ci=? "ΧΑΟΣ" "χαοσ") ⇒ #t
The list->string
procedure returns an istring
formed from the characters in list, in order.
It is an error if any element of list is not a character.
Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
An efficient implementation of (compose list->text reverse)
:
(reverse-list->text '(#\a #\B #\c)) ⇒ "cBa"
This is a common idiom in the epilogue of string-processing loops
that accumulate their result using a list in reverse order.
(See also string-concatenate-reverse
for the “chunked” variant.)
The string->list
procedure returns a newly allocated list of the
characters of string between start and end, in order.
The string->list
and list->string
procedures are inverses
so far as equal?
is concerned.
The vector->string
procedure returns a newly allocated
string of the objects contained in the elements of vector
between start and end.
It is an error if any element of vector between start
and end is not a character, or is a character forbidden in strings.
(vector->string #(#\1 #\2 #\3)) ⇒ "123" (vector->string #(#\1 #\2 #\3 #\4 #\5) 2 4) ⇒ "34"
The string->vector
procedure
returns a newly created vector initialized to the elements
of the string string between start and end.
(string->vector "ABC") ⇒ #(#\A #\B #\C) (string->vector "ABCDE" 1 3) ⇒ #(#\B #\C)
These procedures take a string argument and return a string result.
They are defined in terms of Unicode’s locale–independent case mappings
from Unicode scalar–value sequences to scalar–value sequences. In
particular, the length of the result string can be different from the
length of the input string. When the specified result is equal in the
sense of string=?
to the argument, these procedures may return
the argument instead of a newly allocated string.
The string-upcase
procedure converts a string to upper case;
string-downcase
converts a string to lower case. The
string-foldcase
procedure converts the string to its case–folded
counterpart, using the full case–folding mapping, but without the
special mappings for Turkic languages. The string-titlecase
procedure converts the first cased character of each word, and downcases
all other cased characters.
(string-upcase "Hi") ⇒ "HI" (string-downcase "Hi") ⇒ "hi" (string-foldcase "Hi") ⇒ "hi" (string-upcase "Straße") ⇒ "STRASSE" (string-downcase "Straße") ⇒ "straße" (string-foldcase "Straße") ⇒ "strasse" (string-downcase "STRASSE") ⇒ "strasse" (string-downcase "Σ") ⇒ "σ" ; Chi Alpha Omicron Sigma: (string-upcase "ΧΑΟΣ") ⇒ "ΧΑΟΣ" (string-downcase "ΧΑΟΣ") ⇒ "χαος" (string-downcase "ΧΑΟΣΣ") ⇒ "χαοσς" (string-downcase "ΧΑΟΣ Σ") ⇒ "χαος σ" (string-foldcase "ΧΑΟΣΣ") ⇒ "χαοσσ" (string-upcase "χαος") ⇒ "ΧΑΟΣ" (string-upcase "χαοσ") ⇒ "ΧΑΟΣ" (string-titlecase "kNock KNoCK") ⇒ "Knock Knock" (string-titlecase "who's there?") ⇒ "Who's There?" (string-titlecase "r6rs") ⇒ "R6rs" (string-titlecase "R6RS") ⇒ "R6rs"
Since these procedures are locale–independent, they may not be appropriate for some locales.
Kawa Note: The implementation of string-titlecase
does not correctly handle the case where an initial character
needs to be converted to multiple characters, such as
“LATIN SMALL LIGATURE FL” which should be converted to
the two letters "Fl"
.
Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
These procedures take a string argument and return a string result,
which is the input string normalized to Unicode normalization form D,
KD, C, or KC, respectively. When the specified result is equal in the
sense of string=?
to the argument, these procedures may return
the argument instead of a newly allocated string.
(string-normalize-nfd "\xE9;") ⇒ "\x65;\x301;" (string-normalize-nfc "\xE9;") ⇒ "\xE9;" (string-normalize-nfd "\x65;\x301;") ⇒ "\x65;\x301;" (string-normalize-nfc "\x65;\x301;") ⇒ "\xE9;"
Return the length of the longest common prefix/suffix of string1 and string2. For prefixes, this is equivalent to their “mismatch index” (relative to the start indexes).
The optional start/end indexes restrict the comparison to the indicated substrings of string1 and string2.
Is string1 a prefix/suffix of string2?
The optional start/end indexes restrict the comparison to the indicated substrings of string1 and string2.
string-index
searches through the given substring from the
left, returning the index of the leftmost character satisfying the
predicate pred.
string-index-right
searches from the right, returning
the index of the rightmost character satisfying the predicate pred.
If no match is found, these procedures return #f
.
The start and end arguments specify the beginning and end
of the search; the valid indexes relevant to the search include start
but exclude end. Beware of “fencepost”" errors: when searching
right-to-left, the first index considered is (- end 1)
,
whereas when searching left-to-right, the first index considered is start.
That is, the start/end indexes describe the same half-open interval
[start,end)
in these procedures that they do in other string procedures.
The -skip
functions are similar, but use the complement of the
criterion: they search for the first char that doesn’t satisfy
pred. To skip over initial whitespace, for example, say
(substring string (or (string-skip string char-whitespace?) (string-length string)) (string-length string))
These functions can be trivially composed with string-take
and
string-drop
to produce take-while
, drop-while
,
span
, and break
procedures without loss of efficiency.
Does the substring of string1 specified by start1 and end1 contain the sequence of characters given by the substring of string2 specified by start2 and end2?
Returns #f
if there is no match.
If start2 = end2, string-contains
returns start1 but string-contains-right
returns end1. Otherwise returns the index in string1 for the first character of the first/last match; that index lies within the half-open interval [start1,end1), and the match lies entirely within the [start1,end1) range of string1.
(string-contains "eek -- what a geek." "ee" 12 18) ; Searches "a geek" ⇒ 15
Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.
Returns a string whose characters form the concatenation of the given strings.
Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Concatenates the elements of string-list together into a single istring.
Rationale: Some implementations of Scheme limit the number of
arguments that may be passed to an n-ary procedure, so the
(apply string-append string-list)
idiom,
which is otherwise equivalent to using this procedure, is not as portable.
With no optional arguments, calling this procedure is equivalent to
(string-concatenate (reverse string-list))
.
If the optional argument final-string is specified,
it is effectively consed onto the beginning of string-list
before performing the list-reverse and string-concatenate operations.
If the optional argument end is given, only the characters up to but not including end in final-string are added to the result, thus producing
(string-concatenate (reverse (cons (substring final-string 0 end) string-list)))
For example:
(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7) ⇒ "Hello, I must be going."
Rationale: This procedure is useful when constructing procedures that accumulate character data into lists of string buffers, and wish to convert the accumulated data into a single string when done. The optional end argument accommodates that use case when final-string is a bob-full mutable string, and is allowed (for uniformity) when final-string is an immutable string.
This procedure is a simple unparser; it pastes strings together using the delimiter string, returning an istring.
The string-list is a list of strings.
The delimiter is the string used to delimit elements; it defaults to a single space " "
.
The grammar argument is a symbol that determines how the delimiter
is used, and defaults to 'infix
.
It is an error for grammar to be any symbol other than these four:
'infix
An infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty string.
'strict-infix
Means the same as 'infix
if the string-list is non-empty,
but will signal an error if given an empty list.
(This avoids an ambiguity shown in the examples below.)
'suffix
Means a suffix or terminator grammar: insert the delimiter after every list element.
'prefix
Means a prefix grammar: insert the delimiter before every list element.
(string-join '("foo" "bar" "baz")) ⇒ "foo bar baz" (string-join '("foo" "bar" "baz") "") ⇒ "foobarbaz" (string-join '("foo" "bar" "baz") ":") ⇒ "foo:bar:baz" (string-join '("foo" "bar" "baz") ":" 'suffix) ⇒ "foo:bar:baz:" ;; Infix grammar is ambiguous wrt empty list vs. empty string: (string-join '() ":") ⇒ "" (string-join '("") ":") ⇒ "" ;; Suffix and prefix grammars are not: (string-join '() ":" 'suffix)) ⇒ "" (string-join '("") ":" 'suffix)) ⇒ ":"
Returns
(string-append (substring string1 0 start1) (substring string2 start2 end2) (substring string1 end1 (string-length string1)))
That is, the segment of characters in string1 from start1 to end1 is replaced by the segment of characters in string2 from start2 to end2. If start1=end1, this simply splices the characters drawn from string2 into string1 at that position.
Examples:
(string-replace "The TCL programmer endured daily ridicule." "another miserable perl drone" 4 7 8 22) ⇒ "The miserable perl programmer endured daily ridicule." (string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) ⇒ "It's lots of fun to code it up in Scheme." (define (string-insert s i t) (string-replace s t i i)) (string-insert "It's easy to code it up in Scheme." 5 "really ") ⇒ "It's really easy to code it up in Scheme." (define (string-set s i c) (string-replace s (string c) i (+ i 1))) (string-set "String-ref runs in O(n) time." 19 #\1) ⇒ "String-ref runs in O(1) time."
Also see string-append!
and string-replace!
for destructive changes to a mutable string.
These are the fundamental iterators for strings.
The string-fold
procedure maps the kons procedure across
the given string from left to right:
(... (kons string2 (kons string1 (kons string0 knil))))
In other words, string-fold obeys the (tail) recursion
(string-fold kons knil string start end) = (string-fold kons (kons stringstart knil) start+1 end)
The string-fold-right
procedure maps kons across the given
string string from right to left:
(kons string0 (... (kons stringend-3 (kons stringend-2 (kons stringend-1 knil)))))
obeying the (tail) recursion
(string-fold-right kons knil string start end) = (string-fold-right kons (kons stringend-1 knil) start end-1)
Examples:
;;; Convert a string or string to a list of chars. (string-fold-right cons '() string) ;;; Count the number of lower-case characters in a string or string. (string-fold (lambda (c count) (if (char-lower-case? c) (+ count 1) count)) 0 string)
The string-fold-right combinator is sometimes called a "catamorphism."
The strings must all have the same length. proc should accept as many arguments as there are strings.
The start-end variant is provided for compatibility
with the SRFI-13 version. (In that case start and end
count code Unicode scalar values (character
values),
not Java 16-bit char
values.)
The string-for-each
procedure applies proc element–wise to
the characters of the strings for its side effects, in order from
the first characters to the last. proc is always called in the
same dynamic environment as string-for-each
itself.
Analogous to for-each
.
(let ((v '())) (string-for-each (lambda (c) (set! v (cons (char->integer c) v))) "abcde") v) ⇒ (101 100 99 98 97)
Performance note: The compiler generates efficient code
for string-for-each
.
If proc is a lambda expression, it is inlined.
The string-map
procedure applies proc element-wise to
the elements of the strings and returns a string of the results, in order.
It is an error if proc does not accept as many arguments as there
are strings, or return other than a single character or a string.
If more than one string is given and not all strings have the same length,
string-map
terminates when the shortest string runs out.
The dynamic order in
which proc is applied to the elements of the strings is unspecified.
(string-map char-foldcase "AbdEgH") ⇒ "abdegh"
(string-map (lambda (c) (integer->char (+ 1 (char->integer c)))) "HAL") ⇒ "IBM"
(string-map (lambda (c k) ((if (eqv? k #\u) char-upcase char-downcase) c)) "studlycaps xxx" "ululululul") ⇒ "StUdLyCaPs"
Traditionally the result of proc had to be a character, but Kawa (and SRFI-140) allows the result to be a string.
Performance note: The string-map
procedure has not been
optimized (mainly because it is not very useful):
The characters are boxed, and the proc is not inlined even if
it is a lambda expression.
Calls proc on each valid index of the specified substring, converts
the results of those calls into strings, and returns the concatenation
of those strings. It is an error for proc to return anything other
than a character or string. The dynamic order in which proc is called
on the indexes is unspecified, as is the dynamic order in which the
coercions are performed. If any strings returned by proc are mutated
after they have been returned and before the call to string-map-index
has returned, then string-map-index
returns a string with unspecified
contents; the string-map-index procedure itself does not mutate those
strings.
Calls proc on each valid index of the specified substring, in increasing order, discarding the results of those calls. This is simply a safe and correct way to loop over a substring.
Example:
(let ((txt (string->string "abcde")) (v '())) (string-for-each-index (lambda (cur) (set! v (cons (char->integer (string-ref txt cur)) v))) txt) v) ⇒ (101 100 99 98 97)
Returns a count of the number of characters in the specified substring of string that satisfy the predicate pred.
Return an immutable string consisting of only selected characters, in order:
string-filter
selects only the characters that satisfy pred;
string-remove
selects only the characters that not
satisfy pred
Create an istring by repeating the first argument len times.
If the first argument is a character, it is as if it were wrapped with
the string
constructor.
We can define string-repeat in terms of the more general xsubstring
procedure:
(define (string-repeat S N) (let ((T (if (char? S) (string S) S))) (xsubstring T 0 (* N (string-length T))))
This is an extended substring procedure that implements replicated copying of a substring.
The string is a string; start and end are optional arguments that specify a substring of string,
defaulting to 0 and the length of string.
This substring is conceptually replicated both up and down the index space,
in both the positive and negative directions.
For example, if string is "abcdefg"
, start is 3,
and end is 6, then we have the conceptual bidirectionally-infinite string
... d e f d e f d e f d e f d e f d e f d ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9
xsubstring
returns the substring of the string beginning
at index from, and ending at to.
It is an error if from is greater than to.
If from and to are missing they default to 0 and
from+(end-start), respectively.
This variant is a generalization of using substring
,
but unlike substring
never shares substructures that would
retain characters or sequences of characters that are substructures of
its first argument or previously allocated objects.
You can use xsubstring
to perform a variety of tasks:
(xsubstring "abcdef" 2 8) ⇒ "cdefab"
(xsubstring "abcdef" -2 4) ⇒ "efabcd"
(xsubstring "abc" 0 7) ⇒ "abcabca"
Note that
It is an error if start=end, unless from=to, which is allowed as a special case.
Returns a list of strings representing the words contained in the substring of string from start (inclusive) to end (exclusive).
The delimiter is a string to be used as the word separator.
This will often be a single character, but multiple characters are
allowed for use cases such as splitting on "\r\n"
.
The returned list will have one more item than the number of non-overlapping
occurrences of the delimiter in the string.
If delimiter is an empty string, then the returned list contains a
list of strings, each of which contains a single character.
The grammar is a symbol with the same meaning as in
the string-join
procedure.
If it is infix
, which is the default, processing is done as
described above, except an empty string produces the empty list;
if grammar is strict-infix
, then an empty string signals an error.
The values prefix
and suffix
cause a leading/trailing empty string in the result to be suppressed.
If limit is a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (so the result will have at most limit+1 elements). If limit is not specified or is #f, then as many splits as possible are made. It is an error if limit is any other value.
To split on a regular expression, you can use SRFI 115’s regexp-split
procedure.
The following procedures create a mutable string, i.e. one that you can modify.
Return a newly allocated mstring of k characters, where k defaults to 0. If char is given, then all elements of the string are initialized to char, otherwise the contents of the string are unspecified.
The 1-argument version is deprecated as poor style, except when k is 0.
Rationale: In many languags the most common pattern for mutable strings
is to allocate an empty string and incrementally append to it.
It seems natural to initialize the string
with (make-string)
, rather than (make-string 0)
.
To return an immutable string that repeats k times a character
char use string-repeat
.
This is as R7RS, except the result is variable-size and we allow leaving out k when it is zero.
Returns a newly allocated mutable (mstring) copy of the part of the given string between start and end.
The following procedures modify a mutable string.
This procedure stores char in element k of string.
(define s1 (make-string 3 #\*)) (define s2 "***") (string-set! s1 0 #\?) ⇒ void s1 ⇒ "?**" (string-set! s2 0 #\?) ⇒ error (string-set! (symbol->string 'immutable) 0 #\?) ⇒ error
Performance note: Calling string-set!
may take time proportional
to the length of the string: First it must scan for the right position,
like string-ref
does. Then if the new character requires
using a surrogate pair (and the old one doesn’t) then we have to make room
in the string, possibly re-allocating a new char
array.
Alternatively, if the old character requires using a surrogate pair
(and the new one doesn’t) then following characters need to be moved.
The function string-set!
is deprecated: It is inefficient,
and it very seldom does the correct thing. Instead, you can
construct a string with string-append!
.
The string must be a mutable string, such as one returned
by make-string
or string-copy
.
The string-append!
procedure extends string
by appending each value (in order) to the end of string.
Each value
should be a character or a string.
Performance note: The compiler converts a call with multiple values
to multiple string-append!
calls.
If a value is known to be a character
, then
no boxing (object-allocation) is needed.
The following example shows how to efficiently process a string
using string-for-each
and incrementally “build” a result string
using string-append!
.
(define (translate-space-to-newline str::string)::string (let ((result (make-string 0))) (string-for-each (lambda (ch) (string-append! result (if (char=? ch #\Space) #\Newline ch))) str) result))
Copies the characters of the string from that are between start end end into the string to, starting at index at. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)
This is equivalent to (and implemented as):
(string-replace! to at (+ at (- end start)) from start end))
(define a "12345") (define b (string-copy "abcde")) (string-copy! b 1 a 0 2) b ⇒ "a12de"
Replaces the characters of string dst (between dst-start and dst-end) with the characters of src (between src-start and src-end). The number of characters from src may be different than the number replaced in dst, so the string may grow or contract. The special case where dst-start is equal to dst-end corresponds to insertion; the case where src-start is equal to src-end corresponds to deletion. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)
The string-fill!
procedure stores fill in the elements
of string between start and end.
It is an error if fill is not a character or is forbidden in strings.
Using function-call syntax with strings is convenient and efficient. However, it has some “gotchas”.
We will use the following example string:
(! str1 "Smile \x1f603;!")
or if you’re brave:
(! str1 "Smile 😃!")
This is "Smile "
followed by an emoticon (“smiling face with
open mouth”) followed by "!"
.
The emoticon has scalar value \x1f603
- it is not
in the 16-bit Basic Multi-language Plane,
and so it must be encoded by a surrogate pair
(#\xd83d
followed by #\xde03
).
The number of scalar values (character
s) is 8,
while the number of 16-bits code units (char
s) is 9.
The java.lang.CharSequence:length
method
counts char
s. Both the length
and the
string-length
procedures count character
s. Thus:
(length str1) ⇒ 8 (string-length str1) ⇒ 8 (str1:length) ⇒ 9
Counting char
s is a constant-time operation (since it
is stored in the data structure).
Counting character
s depends on the representation used:
In geneeral it may take time proportional to the length of
the string, since it has to subtract one for each surrogate pair;
however the istring type (gnu.lists.IString
class)
uses a extra structure so it can count characters in constant-time.
Similarly we can can index the string in 3 ways:
(str1 1) ⇒ #\m :: character (string-ref str1 1) ⇒ #\m :: character (str1:charAt 1) ⇒ #\m :: char
Using function-call syntax when the “function” is a string
and a single integer argument is the same as using string-ref
.
Things become interesting when we reach the emoticon:
(str1 6) ⇒ #\😃 :: character (str1:charAt 6) ⇒ #\d83d :: char
Both string-ref
and the function-call syntax return the
real character, while the charAt
methods returns a partial character.
(str1 7) ⇒ #\! :: character (str1:charAt 7) ⇒ #\de03 :: char (str1 8) ⇒ throws StringIndexOutOfBoundsException (str1:charAt 8) ⇒ #\! :: char
You can index a string with a list of integer indexes, most commonly a range:
(str [i ...])
is basically the same as:
(string (str i) ...)
Generally when working with strings it is best to work with substrings rather than individual characters:
(str [start <: end])
This is equivalent to invoking the substring
procedure:
(substring str start end)
Indexing into a string (using for example string-ref
)
is inefficient because of the possible presence of surrogate pairs.
Hence given an index i access normally requires linearly
scanning the string until we have seen i characters.
The string-cursor API is defined in terms of abstract “cursor values”, which point to a position in the string. This avoids the linear scan.
Typical usage is:
(let* ((str whatever) (end (string-cursor-end str))) (do ((sc::string-cursor (string-cursor-start str) (string-cursor-next str sc))) ((string-cursor>=? sc end)) (let ((ch (string-cursor-ref str sc))) (do-something-with ch))))
Alternatively, the following may be marginally faster:
(let* ((str whatever) (end (string-cursor-end str))) (do ((sc::string-cursor (string-cursor-start str) (string-cursor-next-quick sc))) ((string-cursor>=? sc end)) (let ((ch (string-cursor-ref str sc))) (if (not (char=? ch #\ignorable-char)) (do-something-with ch)))))
The API is non-standard, but is based on that in Chibi Scheme.
An abstract position (index) in a string.
Implemented as a primitive int
which counts the
number of preceding code units (16-bit char
values).
Returns a cursor for the start of the string.
The result is always 0, cast to a string-cursor
.
Returns a cursor for the end of the string - one past the last valid character.
Implemented as (as string-cursor (invoke str 'length))
.
Return the character
at the cursor.
If the cursor points to the second char
of a surrogate pair,
returns #\ignorable-char
.
Return the cursor position count (default 1) character positions forwards beyond cursor. For each count this may add either 1 or 2 (if pointing at a surrogate pair) to the cursor.
Increment cursor by one raw char
position,
even if cursor points to the start of a surrogate pair.
(In that case the next string-cursor-ref
will
return #\ignorable-char
.)
Same as (+ cursor 1)
but with the string-cursor
type.
Return the cursor position count (default 1) character positions backwards before cursor.
Create a substring of the section of string between the cursors start and end.
Is the position of cursor1 respectively before, before or same, same, after, or after or same, as cursor2.
Performance note: Implemented as the corresponding int
comparison.
Apply the procedure proc to each character position in string between the cursors start and end.
Next: String literals, Previous: Character sets, Up: Characters and text [Contents][Index]