4 Texinfo::Document


4.1 Texinfo::Document NAME

Texinfo::Document - Texinfo document tree and information


4.2 Texinfo::Document SYNOPSIS

  use Texinfo::Parser;

  my $parser = Texinfo::Parser::parser();
  my $document = $parser->parse_texi_file("somefile.texi");

  my $indices_information = $document->indices_information();
  my $float_types_arrays = $document->floats_information();
  my $internal_references_array
    = $parser->internal_references_information();

  # $identifier_target is an hash reference on normalized
  # node/float/anchor names.
  my $identifier_target = $document->labels_information();

  # A hash reference, keys are @-command names, value is an
  # array reference holding all the corresponding @-commands.
  # Also contains dircategory and direntry list.
  my $global_commands_information
                 = $document->global_commands_information();

  # a hash reference on document information (encodings,
  # input file name, for example).
  my $global_information = $document->global_information();

4.3 Texinfo::Document NOTES

The Texinfo Perl module main purpose is to be used in texi2any to convert Texinfo to other formats. There is no promise of API stability.


4.4 Texinfo::Document DESCRIPTION

This module is used to represent parsed Texinfo documents, with the Texinfo tree and associated information. In general a document is obtained from a Texinfo parser call, there is no need to setup the document.


4.5 Texinfo::Document METHODS


4.5.1 Getting document information

The main purpose of Texinfo::Document methods is to retrieve information on a Texinfo document.

The Texinfo tree obtained by parsing a Texinfo document is available through tree:

$tree = tree($document, $handler_only)

The $tree is a hash reference. It is described in TEXINFO TREE.

If $handler_only is set and XS extensions are used, the returned tree holds a reference to the C Texinfo tree data only, but no actual Perl Texinfo tree. This avoids building the Perl tree if all the functions called with the tree as argument have XS interfaces and directly use the C data and do not use the Perl tree.

Some global information is available through global_information:

$info = global_information($document)

The $info returned is a hash reference. The possible keys are

included_files

An array of included file paths as they appear in the document. Binary strings. From both @include and @verbatiminclude.

input_encoding_name

input_encoding_name string is the encoding name used for the Texinfo code.

input_file_name
input_directory

The name of the main Texinfo input file and the associated directory. Binary strings. In texi2any, they should come from the command line (and can be decoded with the encoding in the customization variable COMMAND_LINE_ENCODING).

Some command lists are available, such that it is possible to go through the corresponding tree elements without walking the tree. They are available through global_commands_information:

$commands = global_commands_information($document)

$commands is an hash reference. The keys are @-command names. The associated values are array references containing all the corresponding tree elements.

The following list of commands is also available as a key:

dircategory_direntry

An array of successive @dircategory and @direntry as they appear in the document.

All the @-commands that have an associated label (so can be the target of cross references) -- @node, @anchor and @float with label -- have a normalized name associated, constructed as described in the HTML Xref node in the Texinfo documentation. Those normalized labels and the association with @-commands is available through labels_information:

$identifier_target = labels_information($document)

$identifier_target is a hash reference whose keys are normalized labels, and the associated value is the corresponding @-command.

$labels_list = labels_list ($document)

$labels_list is a list of Texinfo tree command elements that could be the target of cross references.

Information on @float grouped by type of floats, each type corresponding to potential @listoffloats is available through floats_information.

$float_types = floats_information($document)

$float_types is a hash reference whose keys are normalized float types (the first float argument, or the @listoffloats argument). The normalization is the same as for the first step of node names normalization. The value is the list of float tree elements appearing in the texinfo document.

Internal references, nodes and section lists may also be available.

$internal_references_array = internal_references_information($document)

The function returns an array reference of cross-reference commands referring to the same document with @-commands that refer to node, anchors or floats.

$nodes_list = nodes_list($document)

Returns an array reference containing the document nodes. In general set to the nodes list returned by Texinfo::Structuring nodes_tree, by a call to register_document_nodes_list.

$sections_list = sections_list($document)

Returns an array reference containing the document sections. In general set to the sections list returned by Texinfo::Structuring sectioning_structure, by a call to register_document_sections_list.

Information about defined indices, indices merging and index entries is available through indices_information.

$indices_information = $document->indices_information()

$indices_information is a hash reference. The keys are

in_code

1 if the index entries should be formatted as code, 0 in the opposite case.

name

The index name.

prefix

An array reference of prefix associated to the index.

merged_in

In case the index is merged to another index, this key holds the name of the index the index is merged into. It takes into account indirectly merged indices.

index_entries

An array reference containing index entry structures for index entries associated with the index. The index entry could be associated to @-commands like @cindex, or @item in @vtable, or definition commands entries like @deffn.

The keys of the index entry structures are

index_name

The index name associated to the command. Not modified if the corresponding index is merged in another index (with @synindex, for example).

entry_element

The element in the parsed tree associated with the @-command holding the index entry.

entry_number

The number of the index entry.

The following shows the references corresponding to the default indexes cp and fn, the fn index having its entries formatted as code and the indices corresponding to the following texinfo

  @defindex some
  @defcodeindex code

  $index_names = {'cp' => {'name' => 'cp', 'in_code' => 0, },
                  'fn' => {'name' => 'fn', 'in_code' => 1, },
                  'some' => {'in_code' => 0},
                  'code' => {'in_code' => 1}};

If name is not set, it is set to the index name.


4.5.2 Merging and sorting indices

Merged and sorted document indices are also available. Parsed indices are not merged nor sorted, Texinfo::Indices functions are called to merge or sort the indices the first time the following methods are called. The results are afterwards associated to the document and simply returned.

In general, those methods should not be called directly, instead Index sorting Converter methods should be used, which already call the following functions.

$merged_indices = $document->merged_indices()

Merge indices if needed and return merged indices. The $merged_indices returned is a hash reference whose keys are the index names and values arrays of index entry structures described in index_entries.

Texinfo::Indices::merge_indices is used to merge the indices.

In general, it is not useful to call this function directly, as it is already called by index sorting functions.

$sorted_indices = $document->sorted_indices_by_index($customization_information, $use_unicode_collation, $locale_lang)
$sorted_indices = $document->sorted_indices_by_letter($customization_information, $use_unicode_collation, $locale_lang)

sorted_indices_by_letter returns the indices sorted by index and letter, while sorted_indices_by_index returns the indices with all entries of an index together.

By default, indices are sorted according to the Unicode Collation Algorithm defined in the Unicode Technical Standard #10, without language-specific collation tailoring. If $use_unicode_collation is set to 0, the sorting will not use the Unicode Collation Algorithm and simply sort according to the codepoints. If $locale_lang is set, the language is used for linguistic tailoring of the sorting, if possible.

When sorting by letter, an array reference of letter hash references is associated with each index name. Each letter hash reference has two keys, a letter key with the letter, and an entries key with an array reference of sorted index entries beginning with the letter. The letter is a character string suitable for sorting letters, but is not necessarily the best to use for output.

When simply sorting, the array of the sorted index entries is associated with the index name.

The optional $customization_information argument is used for error reporting, both to find the Texinfo::Report object to use for error reporting and Texinfo customization variables information. In general, it should be a converter (Getting and setting customization variables) or a document Getting customization options values registered in document).

Texinfo::Indices::sort_indices_by_index and Texinfo::Indices::sort_indices_by_letter are used to sort the indices, if needed.

In general, those methods should not be called directly, instead Texinfo::Convert::Converter::get_converter_indices_sorted_by_index and Texinfo::Convert::Converter::get_converter_indices_sorted_by_letter should be used. The Texinfo::Convert::Converter methods call sorted_indices_by_index and sorted_indices_by_letter.


4.5.3 Getting errors and error registering object

A document has a Texinfo::Report objet associated, that is used to register errors and warning messages in. To get the errors registered in the document, the errors method should be called. It is also possible to get the document associated Texinfo::Report objet by calling the registrar accessor method.

$registrar = registrar($document)

Returns the Texinfo::Report object associated with the $document.

In general, this is not needed as most functions use the document associated Texinfo::Report object automatically. However, for some functions a Texinfo::Report object is passed in argument, being able to get the document registrar object is interesting in those cases.

($error warnings list, $error count) = errors($document)

This function returns as $error_count the count of errors since setting up the $document (or calling the function). The returned $error_warnings_list is an array of hash references one for each error, warning or error line continuation. The format of these hash references is described in Texinfo::Report::errors.


4.5.4 Getting customization options values registered in document

By default, customization information is registered in a document object just after parsing the Texinfo code. Structuring and tree transformation methods then get customization variables values from the document object they have in argument. The customization variables set by default may be a subset selected to be useful for structuring and tree transformation codes.

To retrieve Texinfo customization variables you can call get_conf:

$value = $document->get_conf($variable_name)

Returns the value of the Texinfo customization variable $variable_name (possibly undef), if the variable value was registered in the document, or undef.


4.5.5 Registering document and information in document

The setup of a document is described next, it should only be used in parsers codes.

$document = Texinfo::Document::register($tree, $global_information, $indices_information, $floats_information, $internal_references_information, $global_commands_information, $identifier_target, $labels_list, $parser_registrar)

Setup a document. There is no reason to call this method out of parsers, as it is already done by the Texinfo parsers. The arguments are gathered during parsing and correspond to information returned by the other methods.

Further information can be registered in the document.

register_document_nodes_list ($document, $nodes_list)

Register the $nodes_list array reference as $document nodes list. This method should be called after the processing of document structure.

register_document_options ($document, $options)

The $options hash reference holds options for the document. These options should be Texinfo customization options. Usually, the options registered in the document contain those useful for structuring and tree transformation getting place between Texinfo code parsing and conversion to output formats. Indeed, document customization options are mainly accessed by structuring and tree transformation methods (by calling get_conf). The options should in general be registered before the calls to get_conf.

register_document_sections_list ($document, $sections_list)

Register the $sections_list array reference as $document sections list. This method should be called after the processing of document structure.

set_document_global_info($document, $key, $value)

Add $value $key global information to $document. This method should not be generally useful, as document global information is already set by the Texinfo parser. The information set should be available through the next calls to global_information. The method should in general be called before the calls to global_information.


4.5.6 Methods for Perl and C code interactions

The parsing of Texinfo code, structuring and transformations of the tree called through Texinfo Perl modules may be done by pure Perl modules or by C code called through XS interfaces. In general, it makes no difference whether pure Perl or C code is used. When the document and tree are modified by C code, the Perl structures are automatically rebuilt when calling the accessors described previously. In some cases, however, specific functions need to be called to pass information from C to Perl or perform actions related to C data.

The methods can always be called on pure Perl modules even if they do nothing. Therefore it is, in general, better to call them assuming that modules setting up C data were called, even when it is not the case.

First, document_descriptor can be called to get the document identifier document used by C code to retrieve the document data in C. In general this identifier is directly and transparently taken from the document, but may need to be set on other objects in rare cases.

$document_descriptor = $document->document_descriptor()

Returns the document descriptor if the document is available as C data, 0 or undef if not.

When the tree is directly accessed in Perl (not through a document) but is modified by C code, for instance called through Texinfo::Common or Texinfo::Transformations methods, the Perl structures need to be rebuilt from the C data with rebuild_tree:

$rebuilt_tree = rebuild_tree($tree, $no_store)

Return a $rebuilt_tree, rebuilt from C data if needed. If there is no C data, the tree is returned as is. The tree rebuilt is based on the Texinfo parsed document associated to the Texinfo tree $tree.

If the optional $no_store argument is set, remove the C data.

Note that the Perl tree associated to a document is rebuilt from C data when calling $document->tree(). Similarly, the tree is rebuilt when calling other accessors that depend on the document tree. Therefore rebuild_tree should only be called when there is no document associated to a tree and $document->tree() cannot be called to rebuild the tree.

Some methods allow to release the memory held by C data associated to a Texinfo parsed document:

remove_document($document)

Remove the C data corresponding to $document.


4.6 Texinfo::Document SEE ALSO

Texinfo::Parser. Texinfo::Structuring.


4.7 Texinfo::Document AUTHOR

Patrice Dumas, <pertusus@free.fr>