3 Texinfo::Parser


3.1 Texinfo::Parser NAME

Texinfo::Parser - Parse Texinfo code into a Perl tree


3.2 Texinfo::Parser SYNOPSIS

  use Texinfo::Parser;

  my $parser = Texinfo::Parser::parser();
  my $document = $parser->parse_texi_file("somefile.texi");

  my ($errors, $errors_count) = $document->parser_errors();
  foreach my $error_message (@$errors) {
    warn $error_message->{'error_line'};
  }

3.3 Texinfo::Parser NOTES

The Texinfo Perl module main purpose is to be used in texi2any to convert Texinfo to other formats. There is no promise of API stability.


3.4 Texinfo::Parser DESCRIPTION

Texinfo::Parser will parse Texinfo text into a Perl tree. In one pass it expands user-defined @-commands, conditionals (@ifset, @ifinfo...) and @value and constructs the tree. Some extra information is gathered while doing the tree: for example, the @quotation associated to an @author command, the number of columns in a multitable, or the node associated with a section.


3.5 Texinfo::Parser METHODS

No method is exported in the default case. The module allows both an object-oriented syntax, or traditional function, with the parser as an opaque data structure given as an argument to every function.


3.5.1 Initialization

The following method is used to construct a new Texinfo::Parser object:

$parser = Texinfo::Parser::parser($options)

This method creates a new parser. The options may be provided as a hash reference. Most of those options correspond to Texinfo customization options described in the Texinfo manual.

CPP_LINE_DIRECTIVES

Handle cpp like synchronization lines if set. Set in the default case.

EXPANDED_FORMATS

An array reference of the output formats for which @ifFORMAT conditional blocks should be expanded. Default is empty.

FORMAT_MENU

Possible values are nomenu, menu and sectiontoc. Only report menu-related errors for menu.

INCLUDE_DIRECTORIES

An array reference of directories in which @include files should be searched for. Default contains the working directory, ..

IGNORE_SPACE_AFTER_BRACED_COMMAND_NAME

If set, spaces after an @-command name that take braces are ignored. Default on.

MAX_MACRO_CALL_NESTING

Maximal number of nested user-defined macro calls. Default is 100000.

documentlanguage

A string corresponding to a document language set by @documentlanguage. It overrides the document @documentlanguage information, if present.

values

A hash reference. Keys are names, values are the corresponding values. Same as values set by @set.


3.5.2 Parsing Texinfo text

Different methods may be called to parse some Texinfo code: parse_texi_line for a line, parse_texi_piece for a fragment of Texinfo, parse_texi_text for a string corresponding to a full document and parse_texi_file for a file. The first argument of these functions is a parser.

When parse_texi_line is used, the resulting tree is rooted at a root_line type container. Otherwise, the resulting tree should be rooted at a document_root type container.

$tree = $parser->parse_texi_line($text, $first_line_number)

This function is used to parse a short fragment of Texinfo code.

$text is the string containing the texinfo line. $first_line_number is the line number of the line, if undef, it will be set to 1.

$document = $parser->parse_texi_piece($text, $first_line_number)

This function is used to parse Texinfo fragments.

$text is the string containing the texinfo text. $first_line_number is the line number of the first text line, if undef, it will be set to 1.

$document = $parser->parse_texi_text($text, $first_line_number)

This function is used to parse a text as a whole document.

$text is the string containing the texinfo text. $first_line_number is the line number of the first text line, if undef, it will be set to 1.

$document = $parser->parse_texi_file($file_name)

The file with name $file_name is considered to be a Texinfo file and is parsed into a tree. $file_name should be a binary string.

The errors collected during the tree parsing are available with the resulting document parser_errors. These errors are internally registered in a Texinfo::Report object.

($error_warnings_list, $error_count) = $document->parser_errors()

This function returns as $error_count the count of parsing errors. The $error_warnings_list is an array of hash references one for each error, warning or error line continuation. They are described in detail in Texinfo::Report::errors.


3.6 TEXINFO TREE

A Texinfo tree element (called element because node is overloaded in the Texinfo world) is an hash reference. There are three main categories of tree element. Tree elements associated with an @-command have a cmdname key holding the @-command name. Tree elements corresponding to text fragments have a text key holding the corresponding text. Finally, the last category is other elements, which in most cases have a type key holding their name. Text fragments and @-command elements may also have an associated type when such information is needed.

The children of an @-command or of other container element are in the array referred to with the args key or with the contents key. The args key is for arguments of @-commands, either in braces or on the rest of the line after the command, depending on the type of command. The contents key array holds the contents of the texinfo code appearing within a block @-command, within a container, or within a @node or sectioning @-command.

Another important key for the elements is the extra key which is associated to a hash reference and holds all kinds of information that is gathered during the parsing and may help with the conversion.

You can see examples of the tree structure by running makeinfo like this:

  makeinfo -c DUMP_TREE=1 -c TEXINFO_OUTPUT_FORMAT=parse document.texi

For a simpler, more regular representation of the tree structure, you can do:

  makeinfo -c TEXINFO_OUTPUT_FORMAT=debugtree document.texi

3.6.1 Element keys

cmdname

The command name of @-command elements.

text

The text fragment of text elements.

type

The type of element considered, in general a container. Frequent types encountered are paragraph for a paragraph container, brace_container for the container holding a brace @-commands content, line_arg and block_line_arg contain the arguments appearing on the line of @-commands. Text fragments may have a type to give an information of the kind of text fragment, for example spaces_before_paragraph is associated to spaces appearing before a paragraph beginning. Most @-commands elements do not have a type associated.

args

Arguments in braces or on @-command line. An array reference.

contents

The Texinfo appearing in the element. For block commands, other containers, @node and sectioning commands. An array reference.

parent

The parent element.

source_info

An hash reference corresponding to information on the location of the element in the Texinfo input manual. It should mainly be available for @-command elements, and only for @-commands that are considered to be complex enough that the location in the document is needed, for example to prepare an error message.

The keys of the line number hash references are

line_nr

The line number of the @-command.

file_name

The file name where @-command appeared.

macro

The user macro name the @-command is expanded from.

info

A hash reference holding any other information that cannot be obtained otherwise from the tree. See Information available in the info key.

extra

A hash reference holding information that could also be obtained from the tree, but is directly associated to the element to simplify downstream code. See Information available in the extra key.


3.6.2 Element types


3.6.2.1 Types for command elements

Some types can be associated with @-commands (in addition to cmdname), although usually there will be no type at all. The following are the possible values of type for tree elements for @-commands.

definfoenclose_command

This type is set for an @-command that is redefined by @definfoenclose. The beginning is in {'extra'}->{'begin'} and the end in {'extra'}->{'end'}.

The command name is the info command_name value.

index_entry_command

This is the type of index entry command like @cindex, and, more importantly user-defined index entry commands. So for example if there is:

 @defindex foo
  ...

 @fooindex index entry

the @fooindex @-command element will have the index_entry_command type.

The command name is the info command_name value.


3.6.2.2 Types for text elements

The text elements may have the following types (or may have no type at all):

after_menu_description_line
space_at_end_menu_node

Space after a node in the menu entry, when there is no description, and space appearing after the description line.

delimiter
spaces

Spaces on definition command line separating the definition command arguments. Delimiters, such as comma, square brackets and parentheses appearing in definition command line arguments at the end of the line, separated from surrounding texts during the parsing phase.

empty_line

An empty line (possibly containing whitespace characters only).

ignorable_spaces_after_command

spaces appearing after an @-command without braces that does not take argument on the line, but which is followed by ignorable spaces, such as @item in @itemize or @multitable, or @noindent.

spaces_after_close_brace

Spaces appearing after a closing brace, for some rare commands for which this space should be ignorable (like @caption or @sortas).

spaces_before_paragraph

Space appearing before a paragraph beginning.

raw

Text in an environment where it should be kept as is (in @verbatim, @verb, @macro body).

rawline_arg

Used for the arguments to some special line commands whose arguments aren’t subject to the usual macro expansion. For example @set, @clickstyle, @unmacro, @comment. The argument is associated to the text key.

spaces_at_end

Space within an index @-command before an @-command interrupting the index command.

text_after_end

Text appearing after @bye.

text_before_beginning

Text appearing before real content, including the \input texinfo.tex.

untranslated

English text added by the parser that may need to be translated during conversion. Happens for definition line @-commands aliases that leads to prepending text such as “Function”.


3.6.2.3 Tree container elements

Some types of element are containers of portions of the tree, either for the whole tree, or for contents appearing before @node and sectioning commands.

before_node_section

Content before nodes and sectioning commands at the beginning of document_root.

document_root
root_line

root_line is the type of the root tree when parsing Texinfo line fragments using parse_texi_line. document_root is the document root otherwise.

document_root first content should be before_node_section, then nodes and sections @-commands elements, @bye element and postamble_after_end.

postamble_after_end

This container holds everything appearing after @bye.

preamble_before_beginning

This container holds everything appearing before the first content, including the \input texinfo.tex line and following blank lines.

preamble_before_setfilename

This container holds everything that appears before @setfilename.

preamble_before_content

This container holds everything appearing before the first formatted content, corresponding to the preamble in the Texinfo documentation.


3.6.2.4 Types of container elements

The other types of element are containers with other elements appearing in their contents. The paragraph container holds normal text from the Texinfo manual outside of any @-commands, and within @-commands with blocks of text (@footnote, @itemize @item, @quotation for example). The preformatted container holds the content appearing in @-commands like @example and the rawpreformatted container holds the content appearing in format commands such as @html. The other containers are more specific.

The types of container element are the following:

balanced_braces

Special type containing balanced braces content (braces included) in the context where they are valid, and where balanced braces need to be collected to know when a top-level brace command is closed. In @math, in raw output format brace commands and within brace @-commands in raw output format block commands.

before_defline

A container for content before the first @defline or @deftypeline in @defblock.

before_item

A container for content before the first @item of block @-commands with items (@table, @multitable, @enumerate...).

brace_container
brace_command_context
brace_arg
line_arg
block_line_arg
following_arg

Those containers occur within the args array of @-commands taking an argument. brace_container is used for the argument to commands taking arguments surrounded by braces when the whole text in the braces is in the argument. brace_arg is used for the arguments to commands taking arguments surrounded by braces when the leading and, in most cases, trailing spaces are not part of the argument, and for arguments in braces separated by commas. brace_command_context is used for @-commands with braces that start a new context (@footnote, @caption, @math).

line_arg is used for commands that take the texinfo code on the rest of the line as their argument, such as @settitle, @node, @section. block_line_arg is similar but is used for commands that start a new block (which is to be ended with @end).

following_arg is used for the accent @-commands argument that did not use braces but instead followed the @-command, possibly after a space, as

  @~n
  @ringaccent A

For example

 @code{in code}

leads to

 {'cmdname' => 'code',
  'args' => [{'type' => 'brace_container',
              'contents' => [{'text' => 'in code'}]}]}
bracketed_arg

Bracketed argument. On definition command and on @multitable line.

bracketed_linemacro_arg

Argument of a user defined linemacro call in bracket. It holds directly the argument text (which does not contain the braces) and does not contain other elements. It should not appear directly in the tree as the user defined linemacro call is replaced by the linemacro body.

def_category
def_class
def_type
def_name
def_typearg
def_arg

Definition line arguments containers corresponding to the different parts of a definition line command. Contains one bracketed_arg, def_line_arg or untranslated_def_line_arg container.

def_line
def_item
inter_def_item

The def_line type is associated with a container within a block definition command. It holds the definition line arguments in block_line_arg. A @def* @-command line command such as @deffnx or @defline also holds the definition line arguments, in line_arg. The type of each definition line arguments element describes the meaning of the element. It is one of def_category, def_name, def_class, def_type, def_arg, def_typearg, spaces or delimiter, depending on the definition.

The container with type def_item holds the definition text content. Content appearing before a definition command with a x form is in an inter_def_item container.

def_line_arg
untranslated_def_line_arg

the def_line_arg contains one or several elements that together are a single unit on a definition command line. This container is very similar with a bracketed_arg on a definition line, except that there is no bracket. Appears in definition line arguments containers such as def_category, def_arg or similar.

The untranslated_def_line_arg is similar, but only happens for automatically added categories and contains only a text element. For example, the deffun line def_category container may contain an untranslated_def_line_arg type container containing itself a text element with “Function” as text, if the document language demands a translation. Note that the untranslated_def_line_arg is special, as, in general, it should not be recursed into, as the text within is untranslated, but the untranslated text should be gathered when converting the untranslated_def_line_arg type container.

macro_call
macro_call_line
rmacro_call
rmacro_call_line
linemacro_call

Container holding the arguments of a user defined macro, linemacro or rmacro. It should not appear directly in the tree as the user defined call is expanded. The name of the macro, rmacro or linemacro is the the info command_name value. The macro_call_line or rmacro_call_line elements are used when there are no braces and the whole line is the argument.

macro_name
macro_arg

Taken from @macro definition and put in the args key array of the macro, macro_name is the type of the text fragment corresponding to the macro name, macro_arg is the type of the text fragments corresponding to macro formal arguments.

menu_comment

The menu_comment container holds what is between menu entries in menus. For example, in:

  @menu
  Menu title

  * entry::

  Between entries
  * other::
  @end menu

Both

  Menu title

and

  Between entries

will be in a menu_comment.

menu_entry
menu_entry_leading_text
menu_entry_name
menu_entry_separator
menu_entry_node
menu_entry_description

A menu_entry holds a full menu entry, like

  * node::    description.

The different elements of the menu entry are in the menu_entry contents array reference.

menu_entry_leading_text holds the star and following spaces. menu_entry_name is the menu entry name (if present), menu_entry_node corresponds to the node in the menu entry, menu_entry_separator holds the text after the node and before the description, in most cases :: . Lastly, menu_entry_description is for the description.

multitable_head
multitable_body
row

In @multitable, a multitable_head container contains all the rows with @headitem, while multitable_body contains the rows associated with @item. A row container contains the @item and @tab forming a row.

paragraph

A paragraph. The contents of a paragraph (like other container elements for Texinfo content) are elements representing the contents of the paragraph in the order they occur, such as text elements without a cmdname or type, or @-command elements for commands appearing in the paragraph.

preformatted

Texinfo code within a format that is not filled. Happens within some block commands like @example, but also in menu (in menu descriptions, menu comments...).

rawpreformatted

Texinfo code within raw output format block commands such as @tex or @html.

table_entry
table_term
table_definition
inter_item

Those containers appear in @table, @ftable and @vtable. A table_entry container contains an entire row of the table. It contains a table_term container, which holds all the @item and @itemx lines. This is followed by a table_definition container, which holds the content that is to go into the second column of the table.

If there is any content before an @itemx (normally only comments, empty lines or maybe index entries are allowed), it will be in a container with type inter_item at the same level of @item and @itemx, in a table_term.


3.6.3 Information available in the info key

arg_line

The string correspond to the line after the @-command for @-commands that have special arguments on their line, and for @macro line.

command_name

Name of commands that can be defined dynamically. The name of index command or definfoenclose defined command (also available in cmdname for those commands). The name of user defined macro, rmacro or linemacro called associated with the element holding the arguments of the user defined command call.

delimiter

@verb delimiter is in delimiter.

inserted

Set if the element is not in the Texinfo input code, but is inserted as a default for @-command argument or as a definition command automatically inserted category (for example Function for @defun).

spaces_after_argument

A reference to an element containing the spaces after @-command arguments before a comma, a closing brace or at end of line, for some @-commands and bracketed content type with opening brace, and line commands and block command lines taking Texinfo as argument and comma delimited arguments. Depending on the @-command, the spaces_after_argument is associated with the @-command element, or with each argument element.

spaces_after_cmd_before_arg

For accent commands with spaces following the @-command, like:

 @ringaccent A
 @^ u

there is a spaces_after_cmd_before_arg key linking to an element containing the spaces appearing after the command in text.

Space between a brace @-command name and its opening brace also ends up in spaces_after_cmd_before_arg. It is not recommended to leave space between an @-command name and its opening brace.

spaces_before_argument

A reference to an element containing the spaces following the opening brace of some @-commands with braces and bracketed content type, spaces following @-commands for line commands and block command taking Texinfo as argument, and spaces following comma delimited arguments. For context brace commands, line commands and block commands, spaces_before_argument is associated with the @-command element, for other brace commands and for spaces after comma, it is associated with each argument element.


3.6.4 Information available in the extra key


3.6.4.1 Extra keys available for more than one @-command

element_node

The node element in the parsed tree containing the element. Set for @-commands elements that have an associated index entry and for @nodedescription.

element_region

The region command (@copying, @titlepage) containing the element, if it is in such an environement. Set for @-commands elements that have an associated index entry and for @anchor.

index_entry

The index entry information is associated to @-commands that have an associated index entry. The associated information should not be directly accessed, instead Texinfo::Common::lookup_index_entry should be called on the extra index_entry value:

   my ($index_entry, $index_info)
    = Texinfo::Common::lookup_index_entry(
                        $element->{'extra'}->{'index_entry'},
                        $indices_information);

The $indices_information is the information on a Texinfo manual indices obtained from Texinfo::Document::indices_information. The index entry information hash returned by Texinfo::Common::lookup_index_entry is described in Texinfo::Document index_entries.

Currently, the index_entry value is an array reference with an index name as first element and the index entry number in that index (1-based) as second element.

index_ignore_chars

A string containing the characters flagged as ignored in key sorting in the document by setting flags such as txiindexbackslashignore. Set, if not empty, for @-commands elements that have an associated index entry.

misc_args

An array holding strings, the arguments of @-commands taking simple textual arguments as arguments, like @everyheadingmarks, @frenchspacing, @alias, @synindex, @columnfractions.

text_arg

The string correspond to the line after the @-command for @-commands that have an argument interpreted as simple text, like @setfilename, @end or @documentencoding.


3.6.4.2 Extra keys specific of certain @-commands or containers

@abbr
@acronym

The first argument normalized is in normalized.

@anchor
@float

@-commands that are targets for cross-references have a normalized key for the normalized label, built as specified in the Texinfo documentation in the HTML Xref node. There is also a node_content key for an element holding the corresponding content.

@author

If in a @titlepage, the titlepage is in titlepage, if in @quotation or @smallquotation, the corresponding tree element is in quotation.

The author tree element is in the authors array of the @titlepage or the @quotation or @smallquotation it is associated with.

@click

In clickstyle there is the current clickstyle command.

def_line
line definition command

def_command holds the line definition command name, without x if the line definition command is an x form of a block definition command. For a def_line container, def_command holds the command name associated with the def_line. original_def_cmdname is the original def command name.

If the element is a definition line command and is an x form of a block definition command, it has not_after_command set if not appearing after the block definition command without x.

The def_index_element is a Texinfo tree element corresponding to the index entry associated to the definition line, based on the name and class. If needed this element is based on translated strings. In that case, if @documentlanguage is defined where the element is located, documentlanguage holds the documentlanguage value. def_index_ref_element is similar, but not translated, and only set if there could have been a translation.

The omit_def_name_space key value is set and true if the Texinfo variable txidefnamenospace was set, signaling that the space between function definition name and arguments should be omitted.

@definfoenclose defined commands

begin holds the string beginning the @definfoenclose, end holds the string ending the @definfoenclose.

@documentencoding

The argument, normalized is in input_encoding_name.

@enumerate

The enumerate_specification extra key contains the enumerate argument.

@float
@listoffloats

If @float has a first argument, and for @listoffloats argument there is a float_type key with the normalized float type.

caption and shortcaption hold the corresponding tree elements associated to a @float. The @caption or @shortcaption have the float tree element stored in float.

index entry @-command
@subentry

If an index entry @-command, such as @cindex, or a @subentry contains a @sortas command, sortas holds the @sortas command content formatted as plain text.

subentry links to the next level @subentry element. subentry_parent links to the previous level element.

Index entry @-command (but not @subentry) can also have seentry and seealso keys that link to the corresponding @-commands elements.

@inlinefmt
@inlineraw
@inlinefmtifelse
@inlineifclear
@inlineifset

The first argument is in format. If an argument has been determined as being expanded by the Parser, the index of this argument is in expand_index. Index numbering begins at 0, but the first argument is always the format or flag name, so, if set, it should be 1 or 2 for @inlinefmtifelse, and 1 for other commands.

@item in @enumerate or @itemize

The item_number extra key holds the number of this item.

@item and @tab in @multitable

The cell_number index key holds the index of the column of the cell.

@itemize
@table
@vtable
@ftable

The command_as_argument extra key points to the @-command as argument on the @-command line.

If the command in argument for @table, @vtable or @ftable is @kbd and the context and @kbdinputstyle is such that @kbd should be formatted as code, the command_as_argument_kbd_code extra key is set to 1.

@kbd

code is set depending on the context and @kbdinputstyle.

@macro

invalid_syntax is set if there was an error on the @macro line. info key hash arg_line holds the line after @macro.

menu_entry_node

Extra keys with information about the node entry label same as those appearing in the @node line_arg explicit directions arguments extra hash labels information.

@multitable

The key max_columns holds the maximal number of columns. If there is a @columnfractions as argument, then the columnfractions key is associated with the element for the @columnfractions command.

@node

Explicit directions labels information are available in the line_arg node directions arguments of @node. Each line_arg argument element extra hash node_content key value is an element holding the contents corresponding to the node name. There is also a manual_content key if there is an associated external manual name, and a normalized key for the normalized label, built as specified in the HTML Xref Texinfo documentation node.

If you called Texinfo::Structuring::nodes_tree, the node_directions hash in the @node element extra associates up, next and prev keys to the elements corresponding to the node line directions.

An associated_section key holds the tree element of the sectioning command that follows the node. An node_preceding_part key holds the tree element of the @part that precedes the node, if there is no sectioning command between the @part and the node. A node_description key holds the first @nodedescription associated to the node.

A node containing a menu have a menus key which refers to an array of references to menu elements occuring in the node.

The first node containing a @printindex @-command has the isindex key set.

paragraph

The indent or noindent key value is set if the corresponding @-commands are associated with that paragraph.

@part

The next sectioning command tree element is in part_associated_section. The following node tree element is in part_following_node if there is no sectioning command between the @part and the node.

@ref
@xref
@pxref
@inforef

The brace_arg corresponding to the node argument holds information on the label, with the same information in the extra hash as for the @node line_arg explicit directions arguments.

row

The row_number index key holds the index of the row in the @multitable.

sectioning command

The node preceding the command is in associated_node. The part preceding the command is in associated_part. If the level of the document was modified by @raisections or @lowersections, the differential level is in level_modifier.

Other extra keys are set when you call Texinfo::Structuring::sectioning_structure.

untranslated_def_line_arg

documentlanguage holds the @documentlanguage value. If there is a translation context, it should be in translation_context.


3.7 Texinfo::Parser SEE ALSO

Texinfo manual.


3.8 Texinfo::Parser AUTHOR

Patrice Dumas, <pertusus@free.fr>