use Texinfo::Parser; my $parser = Texinfo::Parser::parser(); my $document = $parser->parse_texi_file("somefile.texi"); my $indices_information = $document->indices_information(); my $float_types_arrays = $document->floats_information(); my $internal_references_array = $parser->internal_references_information(); # $identifier_target is an hash reference on normalized # node/float/anchor names. my $identifier_target = $document->labels_information(); # A hash reference, keys are @-command names, value is an # array reference holding all the corresponding @-commands. # Also contains dircategory and direntry list. my $global_commands_information = $document->global_commands_information(); # a hash reference on document information (encodings, # input file name, for example). my $global_information = $document->global_information();
The Texinfo Perl module main purpose is to be used in texi2any
to convert
Texinfo to other formats. There is no promise of API stability.
This module is used to represent parsed Texinfo documents, with the Texinfo tree and associated information. In general a document is obtained from a Texinfo parser call, there is no need to setup the document.
The main purpose of Texinfo::Document methods is to retrieve information on a Texinfo document.
The Texinfo tree obtained by parsing a Texinfo document is available through
tree
:
The $tree is a hash reference. It is described in TEXINFO TREE.
If $handler_only is set and XS extensions are used, the returned tree holds a reference to the C Texinfo tree data only, but no actual Perl Texinfo tree. This avoids building the Perl tree if all the functions called with the tree as argument have XS interfaces and directly use the C data and do not use the Perl tree.
Some global information is available through global_information
:
The $info returned is a hash reference. The possible keys are
An array of included file paths as they appear in the document. Binary
strings. From both @include
and @verbatiminclude
.
input_encoding_name
string is the encoding name used for the
Texinfo code.
The name of the main Texinfo input file and the associated directory.
Binary strings. In texi2any
, they should come from the command line
(and can be decoded with the encoding in the customization variable
COMMAND_LINE_ENCODING
).
Some command lists are available, such that it is possible to go through
the corresponding tree elements without walking the tree. They are
available through global_commands_information
:
$commands is an hash reference. The keys are @-command names. The associated values are array references containing all the corresponding tree elements.
The following list of commands is also available as a key:
All the @-commands that have an associated label (so can be the
target of cross references) -- @node
, @anchor
and @float
with
label -- have a normalized name associated, constructed as described in the
HTML Xref node in the Texinfo documentation. Those normalized labels and
the association with @-commands is available through labels_information
:
$identifier_target is a hash reference whose keys are normalized labels, and the associated value is the corresponding @-command.
$labels_list is a list of Texinfo tree command elements that could be the target of cross references.
Information on @float
grouped by type of floats, each type corresponding
to potential @listoffloats
is available through floats_information
.
$float_types is a hash reference whose keys are normalized float
types (the first float argument, or the @listoffloats
argument).
The normalization is the same as for the first step of node names
normalization. The value is the list of float tree elements appearing
in the texinfo document.
Internal references, nodes and section lists may also be available.
The function returns an array reference of cross-reference commands referring to the same document with @-commands that refer to node, anchors or floats.
Returns an array reference containing the document nodes. In general set to the nodes list returned by Texinfo::Structuring nodes_tree, by a call to register_document_nodes_list.
Returns an array reference containing the document sections. In general set to the sections list returned by Texinfo::Structuring sectioning_structure, by a call to register_document_sections_list.
Information about defined indices, indices merging and index entries is
available through indices_information
.
$indices_information is a hash reference. The keys are
1 if the index entries should be formatted as code, 0 in the opposite case.
The index name.
An array reference of prefix associated to the index.
In case the index is merged to another index, this key holds the name of the index the index is merged into. It takes into account indirectly merged indices.
An array reference containing index entry structures for index entries
associated with the index. The index entry could be associated to
@-commands like @cindex
, or @item
in @vtable
, or definition
commands entries like @deffn
.
The keys of the index entry structures are
The following shows the references corresponding to the default indexes cp and fn, the fn index having its entries formatted as code and the indices corresponding to the following texinfo
@defindex some @defcodeindex code $index_names = {'cp' => {'name' => 'cp', 'in_code' => 0, }, 'fn' => {'name' => 'fn', 'in_code' => 1, }, 'some' => {'in_code' => 0}, 'code' => {'in_code' => 1}};
If name
is not set, it is set to the index name.
Merged and sorted document indices are also available. Parsed indices are not merged nor sorted, Texinfo::Indices functions are called to merge or sort the indices the first time the following methods are called. The results are afterwards associated to the document and simply returned.
In general, those methods should not be called directly, instead Index sorting Converter methods should be used, which already call the following functions.
Merge indices if needed and return merged indices. The $merged_indices returned is a hash reference whose keys are the index names and values arrays of index entry structures described in index_entries.
Texinfo::Indices::merge_indices
is used to merge the indices.
In general, it is not useful to call this function directly, as it is already called by index sorting functions.
sorted_indices_by_letter
returns the indices sorted by index and letter,
while sorted_indices_by_index
returns the indices with all entries
of an index together.
By default, indices are sorted according to the Unicode Collation Algorithm defined in the Unicode Technical Standard #10, without language-specific collation tailoring. If $use_unicode_collation is set to 0, the sorting will not use the Unicode Collation Algorithm and simply sort according to the codepoints. If $locale_lang is set, the language is used for linguistic tailoring of the sorting, if possible.
When sorting by letter, an array reference of letter hash references is associated with each index name. Each letter hash reference has two keys, a letter key with the letter, and an entries key with an array reference of sorted index entries beginning with the letter. The letter is a character string suitable for sorting letters, but is not necessarily the best to use for output.
When simply sorting, the array of the sorted index entries is associated with the index name.
The optional $customization_information argument is used for error reporting, both to find the Texinfo::Report object to use for error reporting and Texinfo customization variables information. In general, it should be a converter (Getting and setting customization variables) or a document Getting customization options values registered in document).
Texinfo::Indices::sort_indices_by_index
and Texinfo::Indices::sort_indices_by_letter
are used to sort the indices, if needed.
In general, those methods should not be called directly, instead
Texinfo::Convert::Converter::get_converter_indices_sorted_by_index
and Texinfo::Convert::Converter::get_converter_indices_sorted_by_letter
should be used. The Texinfo::Convert::Converter
methods call
sorted_indices_by_index
and sorted_indices_by_letter
.
A document has a Texinfo::Report objet associated, that is used to
register errors and warning messages in. To get the errors registered
in the document, the errors
method should be called.
It is also possible to get the document associated Texinfo::Report
objet
by calling the registrar
accessor method.
Returns the Texinfo::Report
object associated with the $document.
In general, this is not needed as most functions use the document associated
Texinfo::Report
object automatically. However, for some functions a
Texinfo::Report
object is passed in argument, being able to
get the document registrar object is interesting in those cases.
This function returns as $error_count the count of errors since setting
up the $document (or calling the function). The returned
$error_warnings_list is an array of hash references
one for each error, warning or error line continuation. The format of
these hash references is described
in Texinfo::Report::errors
.
By default, customization information is registered in a document object just after parsing the Texinfo code. Structuring and tree transformation methods then get customization variables values from the document object they have in argument. The customization variables set by default may be a subset selected to be useful for structuring and tree transformation codes.
To retrieve Texinfo customization variables you can call get_conf
:
Returns the value of the Texinfo customization variable $variable_name
(possibly undef
), if the variable value was registered in the document,
or undef
.
The setup of a document is described next, it should only be used in parsers codes.
Setup a document. There is no reason to call this method out of parsers, as it is already done by the Texinfo parsers. The arguments are gathered during parsing and correspond to information returned by the other methods.
Further information can be registered in the document.
Register the $nodes_list array reference as $document nodes list. This method should be called after the processing of document structure.
The $options hash reference holds options for the document. These options
should be Texinfo customization options. Usually, the options registered in
the document contain those useful for structuring and tree transformation
getting place between Texinfo code parsing and conversion to output formats.
Indeed, document customization options are mainly accessed by structuring and
tree transformation methods (by calling get_conf
). The options should in general be registered before
the calls to get_conf
.
Register the $sections_list array reference as $document sections list. This method should be called after the processing of document structure.
Add $value $key global information to $document. This method
should not be generally useful, as document global information is already
set by the Texinfo parser. The information set should be available through
the next calls to global_information.
The method should in general be called before the calls to
global_information
.
The parsing of Texinfo code, structuring and transformations of the tree called through Texinfo Perl modules may be done by pure Perl modules or by C code called through XS interfaces. In general, it makes no difference whether pure Perl or C code is used. When the document and tree are modified by C code, the Perl structures are automatically rebuilt when calling the accessors described previously. In some cases, however, specific functions need to be called to pass information from C to Perl or perform actions related to C data.
The methods can always be called on pure Perl modules even if they do nothing. Therefore it is, in general, better to call them assuming that modules setting up C data were called, even when it is not the case.
First, document_descriptor
can be called to get the document identifier
document used by C code to retrieve the document data in C. In general
this identifier is directly and transparently taken from the document, but may
need to be set on other objects in rare cases.
Returns the document descriptor if the document is available as C data,
0 or undef
if not.
When the tree is directly accessed in Perl (not through a document)
but is modified by C code, for instance called through Texinfo::Common or
Texinfo::Transformations methods, the Perl structures need to be rebuilt
from the C data with rebuild_tree
:
Return a $rebuilt_tree, rebuilt from C data if needed. If there is no C data, the tree is returned as is. The tree rebuilt is based on the Texinfo parsed document associated to the Texinfo tree $tree.
If the optional $no_store argument is set, remove the C data.
Note that the Perl tree associated to a document is rebuilt from C data
when calling $document->tree()
. Similarly, the tree is rebuilt when
calling other accessors that depend on the document tree. Therefore
rebuild_tree
should only be called when there is no document associated to a
tree and $document->tree()
cannot be called to rebuild the tree.
Some methods allow to release the memory held by C data associated to a Texinfo parsed document:
Copyright 2010- Free Software Foundation, Inc. See the source file for all copyright years.
This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.