4.2 The Wisent Lex lexer

The lexical analysis step of Semantic is performed by the general function semantic-lex. For more information, see Semantic Language Development.

semantic-lex produces lexical tokens of the form:

(token-class start . end)
token-class

Is a symbol that identifies a lexical token class, like symbol, string, number, or PAREN_BLOCK.

start
end

Are the start and end positions of mapped data in the input buffer.

The Wisent’s parser doesn’t depend on the nature of analyzed input stream (buffer, string, etc.), and requires that lexical tokens have a different form (see What the parser must receive):

(token-class value [start . end])

wisent-lex is the default Wisent’s lexer used in Semantic.

Function: wisent-lex

Return the next available lexical token in Wisent’s form.

The variable wisent-lex-istream contains the list of lexical tokens produced by semantic-lex. Pop the next token available and convert it to a form suitable for the Wisent’s parser.

Mapping of lexical tokens as produced by semantic-lex into equivalent Wisent lexical tokens is straightforward:

(token-class start . end)
     ⇒ (token-class value start . end)

value is the input buffer-substring from start to end.