Next: Strings with NUL characters, Previous: The C string representation, Up: Strings [Contents][Index]
For complex string processing, string functions may not be enough, and you need to iterate through a string while processing each (possibly multibyte) character or encoding error in turn. Gnulib has several modules for iterating forward through a string in this way. Backward iteration, that is, from the string’s end to start, is not provided, as it is too hairy in general.
mbiter
module iterates through a string whose length
is already known. The string can contain NULs and encoding errors.
mbiterf
module is like mbiter
except it is more complex and typically faster.
mbuiter
module iterates through a C string whose length
is not a-priori known. The string can contain encoding errors and is
terminated by the first NUL.
mbuiterf
module is like mbuiter
except it is more complex and typically faster.
mcel
module is simpler than mbiter
and mbuiter
and can be faster than even mbiterf
and mbuiterf
.
It can iterate through either strings whose length is known, or
C strings, or strings terminated by other ASCII characters < 0x30.
mcel-prefer
module is like mcel
except that it
causes some other modules to be based on mcel
instead of
on the mbiter
family.
The choice of modules depends on the application’s needs. The
mbiter
module family is more suitable for applications that
treat some sequences of two or more bytes as a single encoding error,
and for applications that need to support obsolescent encodings on
non-GNU platforms, such as CP864, EBCDIC, Johab, and Shift JIS.
In this module family, mbuiter
and mbuiterf
are more
suitable than mbiter
and mbiterf
when arguments are C strings,
lengths are not already known, and it is highly likely that only the
first few multibyte characters need to be inspected.
The mcel
module is simpler and can be faster than the
mbiter
family, and is more suitable for applications that do
not need the mbiter
family’s special features.
The mcel-prefer
module is like mcel
except that it also
causes some other modules, such as mbscasecmp
, to use
mcel
rather than the mbiter
family. This can be simpler
and faster. However, it does not support the obsolescent encodings,
and it may behave differently on data containing encoding errors where
behavior is unspecified or undefined, because in mcel
each
encoding error is a single byte whereas in the mbiter
family a
single encoding error can contain two or more bytes.
If a package uses mcel-prefer
, it may also want to give
gnulib-tool
one or more of the options
--avoid=mbiter, --avoid=mbiterf,
--avoid=mbuiter and --avoid=mbuiterf,
to avoid packaging modules that are not needed.
Next: Strings with NUL characters, Previous: The C string representation, Up: Strings [Contents][Index]