17 Texinfo::Convert::Converter


17.1 Texinfo::Convert::Converter NAME

Texinfo::Convert::Converter - Parent class for Texinfo tree converters


17.2 Texinfo::Convert::Converter SYNOPSIS

  package Texinfo::Convert::MyConverter;

  use Texinfo::Convert::Converter;
  @ISA = qw(Texinfo::Convert::Converter);

  sub converter_defaults ($$) {
    return \%myconverter_defaults;
  }
  sub converter_initialize($) {
    my $self = shift;
    ...
  }

  sub conversion_initialization($;$) {
    my $self = shift;
    my $document = shift;

    if ($document) {
      $self->set_document($document);
    }

    $self->{'document_context'} = [{}];
    ...
  }

  sub conversion_finalization($) {
    my $self = shift;
  }

  sub convert_tree($$) {
    ...
  }

  sub convert($$) {
    my $self = shift;
    my $document = shift;

    $self->conversion_initialization($document);

    ...
    $self->conversion_finalization();
  }

  sub output($$) {
    my $self = shift;
    my $document = shift;

    $self->conversion_initialization($document);

    ...
    $self->conversion_finalization();
    ...
  }

  # end of Texinfo::Convert::MyConverter

  my $converter = Texinfo::Convert::MyConverter->converter();
  $converter->output($texinfo_parsed_document);

17.3 Texinfo::Convert::Converter NOTES

The Texinfo Perl module main purpose is to be used in texi2any to convert Texinfo to other formats. There is no promise of API stability.


17.4 Texinfo::Convert::Converter DESCRIPTION

Texinfo::Convert::Converter is a super class that can be used to simplify converters initialization. The class also provide some useful methods. In turn, the converter should define some methods for conversion. In general convert_tree, output and convert should be defined.

$result = $converter->convert_tree($tree)

The convert_tree method is mandatory and should convert portions of Texinfo tree. Takes a $converter and Texinfo tree $tree in arguments. Returns the converted output.

$result = $converter->output($document)
$result = $converter->output_tree($document)

The output method is used by converters as entry point for conversion to a file with headers and so on. This method should be implemented by converters. output is called from texi2any. output takes a $converter and a Texinfo parsed document Texinfo::Document $document as arguments.

Texinfo::Convert::Converter implements a generic output_tree function suitable for conversion of the Texinfo tree, with the conversion result output into a file or returned from the function. output_tree takes a $converter and a Texinfo parsed document Texinfo::Document $document as arguments. In a converter that uses output_tree, output is in general defined as:

  sub output($$) {
    my $self = shift;
    my $document = shift;

    return $self->output_tree($document);
  }

In general, output and output_tree output to files and return undef. When the output file name is an empty string, however, it is customary for output and output_tree to return the output as a character string instead. The output file name is obtained in output_tree through a call to determine_files_and_directory. In general determine_files_and_directory is also used when output_tree is not used.

$result = $converter->convert($document)

Entry point for the conversion of a Texinfo parsed document to an output format, without the headers usually done when outputting to a file. convert takes a $converter and a Texinfo parsed document Texinfo::Document $document as arguments. Returns the output as a character string. Not mandatory, not called from texi2any, but used in the texi2any test suite.

$result = $converter->convert_output_unit($output_unit)

Can be used for the conversion of output units by converters. convert_output_unit takes a $converter and an output unit $output_unit as argument. The implementation of convert_output_unit of Texinfo::Convert::Converter could be suitable in many cases. Output units are typically returned by Texinfo::OutputUnits split_by_section or Texinfo::OutputUnits split_by_node.

Two methods, converter_defaults and converter_initialize are used for initialization, to give information to Texinfo::Convert::Converter and can be redefined in converters.

To help with the conversion, the set_document function associates a Texinfo::Document to a converter. Other methods are called in default implementations to be redefined to call code at specific moments of the conversion. conversion_initialization, for instance, is generally called at the beginning of output, output_tree and convert. conversion_finalization is generally called at the end of output_tree, output and convert. output_tree also calls the conversion_output_begin method before the Texinfo tree conversion to obtain the beginning of the output. output_tree calls the conversion_output_end method after the Texinfo tree conversion to obtain the end of the output.

For output formats based on output units conversion, the Texinfo::Convert::Plaintext output method could be a good starting point. HTML and Info output are also based on output units conversion. Output units are not relevant for all the formats, the Texinfo tree can also be converted directly, in general by using output_tree. This is how the other Converters are implemented.

Existing backends based on output_tree may be used as examples. Texinfo::Convert::Texinfo together with Texinfo::Convert::PlainTexinfo, as well as Texinfo::Convert::TextContent are trivial examples. Texinfo::Convert::Text is less trivial, although still simple, while Texinfo::Convert::DocBook is a real converter that is also not too complex.

The documentation of Texinfo::Common, Texinfo::OutputUnits, Texinfo::Convert::Unicode and Texinfo::Convert::Text describes modules or additional function that may be useful for backends, while the parsed Texinfo tree is described in Texinfo::Parser.


17.5 Texinfo::Convert::Converter METHODS


17.5.1 Converter Initialization

A module subclassing Texinfo::Convert::Converter is created by calling the converter method that should be inherited from Texinfo::Convert::Converter.

$converter = MyConverter->converter($options)

The $options hash reference holds options for the converter. These options should be Texinfo customization options. The customization options are described in the Texinfo manual or in the customization API manual.

The converter function returns a converter object (a blessed hash reference) after checking the options and performing some initializations.

To help with the initializations, the modules subclassing Texinfo::Convert::Converter can define two methods:

\%defaults = $converter_or_class->converter_defaults($options)

Returns a reference on a hash with defaults for the converter module customization options or undef. The $options hash reference holds options for the converter. This method is called through a converter by converter, but it may also be called through a converter module class.

converter_initialize

This method is called at the end of the Texinfo::Convert::Converter converter initialization.


17.5.2 Conversion

For conversion with output and convert a document to convert should be associated to the converter, in general the document passed in argument of output or convert. The set_document function associates a Texinfo::Document to a converter. This function is used in the default implementations.

$converter->set_document($document)

Associate $document to $converter. Also set the encoding related customization options based on $converter customization information and information on document encoding, and setup converter hash convert_text_options value that can be used to call Texinfo::Convert::Text::convert_to_text.

The conversion_initialization, conversion_finalization, conversion_output_begin and conversion_output_end can be redefined to call code at diverse moments:

$converter->conversion_initialization($document)
$converter->conversion_finalization()

conversion_initialization is called at the beginning of output_tree and of the default implementations of the output and convert functions. conversion_finalization is called at the end of output_tree and of the default output and convert methods implementations. These functions should be redefined to have code run before a document conversion and after the document conversion.

In the default case, conversion_initialization calls set_document to associate the Texinfo::Document document passed in argument to the converter. A subclass converter redefining conversion_initialization should in general call set_document in the redefined function too to associate the converted document to the converter.

$beginning = $converter->conversion_output_begin($output_file, $output_filename)
$end = $converter->conversion_output_end()

conversion_output_begin returned string $beginning is output by the output_tree calling method before the Texinfo tree conversion. The $output_file argument is the output file path. If $output_file is an empty string, it means that text will be returned by the converter instead of being written to an output file. $output_filename is, in general, the file name portion of $output_file (without directory) but can also be set based on @setfilename.

conversion_output_end returned string $end is output by the output_tree calling method after the Texinfo tree conversion.

The default methods implementations return an empty string.

Calling conversion_initialization and, if needed, conversion_finalization in redefined output and convert methods is not mandated, but it is recommended to have similar converter codes. In subclassed converters that do not need to define conversion_initialization, calling the default Texinfo::Convert::Converter conversion_initialization implementation is also recommended to avoid having to explictely call set_document. If conversion_initialization is defined in a converter subclass it is recommended to call set_document at the very beginning of the function to have the document associated to the converter.


17.5.3 Getting and setting customization variables

Texinfo::Convert::Converter implements a simple interface to set and retrieve Texinfo customization variables. Helper functions from diverse Texinfo modules needing customization information expect an object implementing get_conf and/or set_conf. The converter itself can therefore be used in such cases.

Customization variables are typically setup when initializing a converter with converter and completed by Texinfo informative @-commands tree element values, for commands such as @frenchspacing or @footnotestyle.

$converter->force_conf($variable_name, $variable_value)

Set the Texinfo customization option $variable_name to $variable_value. This should rarely be used, but the purpose of this method is to be able to revert a customization that is always wrong for a given output format, like the splitting for example.

$converter->get_conf($variable_name)

Returns the value of the Texinfo customization variable $variable_name.

$status = $converter->set_conf($variable_name, $variable_value)

Set the Texinfo customization option $variable_name to $variable_value if not set as a converter option. Returns false if the customization options was not set.


17.5.4 Registering error and warning messages

Texinfo::Convert::Converter implements an interface to register error and warning messages in the converter, that can be retrieved later on, in general to be given to Texinfo::Report::add_formatted_message. Underneath, Texinfo::Report is used to setup the messages data structure.

$converter->converter_document_error($text, $continuation)
$converter->converter_document_warn($text, $continuation)

Register a warning or an error. The $text is the text of the error or warning.

The $continuation optional arguments, if true, conveys that the line is a continuation line of a message.

$converter->converter_line_error($text, $error_location_info, $continuation)
$converter->converter_line_warn($text, $error_location_info, $continuation)

Register a warning or an error with a line information. The $text is the text of the error or warning. The $error_location_info argument holds the information on the error or warning location. The $error_location_info reference on hash may be obtained from Texinfo elements source_info keys. It may also be setup to point to a file name, using the file_name key and to a line number, using the line_nr key. The file_name key value should be a binary string.

The $continuation optional arguments, if true, conveys that the line is a continuation line of a message.

\@error_warning_messages = $converter->get_converter_errors()

Return a reference on an array containing the error or warning messages registered in the converter. Error and warning messages are hash references as described in Texinfo::Report::errors and can be used in input of Texinfo::Report::add_formatted_message.


17.5.5 Translations in output documents

Texinfo::Convert::Converter provides wrappers around Texinfo::Translations methods that sets the language to the current documentlanguage.

The cdt and pcdt methods are used to translate strings to be output in converted documents, and return a Texinfo tree. The cdt_string is similar but returns a simple string, for already converted strings.

$tree = $converter->cdt($string, $replaced_substrings, $translation_context)
$string = $converter->cdt_string($string, $replaced_substrings, $translation_context)

The $string is a string to be translated. With cdt the function returns a Texinfo tree, as the string is interpreted as Texinfo code after translation. With cdt_string a string is returned.

$replaced_substrings is an optional hash reference specifying some substitution to be done after the translation. The key of the $replaced_substrings hash reference identifies what is to be substituted. In the string to be translated word in brace matching keys of $replaced_substrings are replaced. For cdt, the value is a Texinfo tree that is substituted in the resulting Texinfo tree. For cdt_string, the value is a string that is replaced in the resulting string.

The $translation_context is optional. If not undef this is a translation context string for $string. It is the first argument of pgettext in the C API of Gettext.

$tree = $object->pcdt($translation_context, $string, $replaced_substrings)

Same to cdt except that the $translation_context is not optional. This function is useful to mark strings with a translation context for translation. This function is similar to pgettext in the Gettext C API.


17.5.6 Index sorting

You should call the following methods to sort indices in conversion:

$sorted_indices = $converter->get_converter_indices_sorted_by_index()
$sorted_indices = $converter->get_converter_indices_sorted_by_letter()

get_converter_indices_sorted_by_letter returns the indices sorted by index and letter, while get_converter_indices_sorted_by_index returns the indices with all entries of an index together.

When sorting by letter, an array reference of letter hash references is associated with each index name. Each letter hash reference has two keys, a letter key with the letter, and an entries key with an array reference of sorted index entries beginning with the letter. The letter is a character string suitable for sorting letters, but is not necessarily the best to use for output.

When simply sorting, the array of the sorted index entries is associated with the index name.

The functions call Texinfo::Document::sorted_indices_by_letter or Texinfo::Document::sorted_indices_by_index with arguments based on USE_UNICODE_COLLATION, COLLATION_LANGUAGE and DOCUMENTLANGUAGE_COLLATION customization options, and, if relevant, current @documentlanguage.


17.5.7 Conversion to XML

Some Texinfo::Convert::Converter methods target conversion to XML. Most methods take a $converter as argument to get some information and use methods for error reporting.

$formatted_text = $converter->xml_format_text_with_numeric_entities($text)

Replace quotation marks and hyphens used to represent dash in Texinfo text with numeric XML entities.

$protected_text = $converter->xml_protect_text($text)

Protect special XML characters (&, <, >, ") of $text.

$comment = $converter->xml_comment($text)

Returns an XML comment for $text.

$result = xml_accent($text, $accent_command, $in_upper_case, $use_numeric_entities)

$text is the text appearing within an accent command. $accent_command should be a Texinfo tree element corresponding to an accent command taking an argument. $in_upper_case is optional, and, if set, the text is put in upper case. The function returns the accented letter as XML named entity if possible, falling back to numeric entities if there is no named entity and returns the argument as last resort. $use_numeric_entities is optional. If set, numerical entities are used instead of named entities if possible.

$result = $converter->xml_accents($accent_command, $in_upper_case)

$accent_command is an accent command, which may have other accent commands nested. If $in_upper_case is set, the result should be upper cased. The function returns the accents formatted as XML.

$result = xml_numeric_entity_accent($accent_command_name, $text)

$accent_command_name is the name of an accent command. $text is the text appearing within the accent command. Returns the accented letter as XML numeric entity, or undef is there is no such entity.


17.5.8 Helper methods

The module provides methods that may be useful for converter. Most methods take a $converter as argument to get some information and use methods for error reporting, see Registering error and warning messages. Also to translate strings, see Translations in output documents. For useful methods that need a converter optionally and can be used in converters that do not inherit from Texinfo::Convert::Converter, see Texinfo::Convert::Utils.

$contents_element = $converter->comma_index_subentries_tree($entry, $separator)

$entry is a Texinfo tree index entry element. The function sets up an array with the @subentry contents. The result is returned as contents in the $contents_element element, or undef if there is no such content. $separator is an optional separator argument used, if given, instead of the default: a comma followed by a space.

$result = $converter->convert_accents($accent_command, \&format_accents, $output_encoded_characters, $in_upper_case)

$accent_command is an accent command, which may have other accent commands nested. The function returns the accents formatted either as encoded letters if $output_encoded_characters is set, or formatted using \&format_accents. If $in_upper_case is set, the result should be uppercased.

$succeeded = $converter->create_destination_directory($destination_directory_path, $destination_directory_name)

Create destination directory $destination_directory_path. $destination_directory_path should be a binary string, while $destination_directory_name should be a character string, that can be used in error messages. $succeeded is true if the creation was successful or uneeded, false otherwise.

($output_file, $destination_directory, $output_filename, $document_name, $input_basefile) = $converter->determine_files_and_directory($output_format)

Determine output file and directory, as well as names related to files. The result depends on the presence of @setfilename, on the Texinfo input file name, and on customization options such as OUTPUT, SUBDIR or SPLIT, as described in the Texinfo manual. If $output_format is defined and not an empty string, _$output_format is prepended to the default directory name.

$output_file is mainly relevant when not split and should be used as the output file name. In general, if not split and $output_file is an empty string, it means that text should be returned by the converter instead of being written to an output file. This is used in the test suite. $destination_directory is either the directory $output_file is in, or if split, the directory where the files should be created. $output_filename is, in general, the file name portion of $output_file (without directory) but can also be set based on @setfilename, in particular when $output_file is an empty string. $document_name is $output_filename without extension. $input_basefile is based on the input Texinfo file name, with the file name portion only (without directory).

The strings returned are text strings.

($encoded_name, $encoding) = $converter->encoded_input_file_name($character_string_name, $input_file_encoding)
($encoded_name, $encoding) = $converter->encoded_output_file_name($character_string_name)

Encode $character_string_name in the same way as other file names are encoded in the converter, based on customization variables, and possibly on the input file encoding. Return the encoded name and the encoding used to encode the name. The encoded_input_file_name and encoded_output_file_name functions use different customization variables to determine the encoding.

The $input_file_encoding argument is optional. If set, it is used for the input file encoding. It is useful if there is more precise information on the input file encoding where the file name appeared.

Note that encoded_output_file_name is a wrapper around the function with the same name in Texinfo::Convert::Utils::encoded_output_file_name, and encoded_input_file_name is a wrapper around the function with the same name in Texinfo::Convert::Utils::encoded_input_file_name.

($caption, $prepended) = $converter->float_name_caption($float)

$float is a Texinfo tree @float element. This function returns the caption element that should be used for the float formatting and the $prepended Texinfo tree combining the type and label of the float.

$tree = $converter->float_type_number($float)

$float is a Texinfo tree @float element. This function returns the type and number of the float as a Texinfo tree with translations.

$end_line = $converter->format_comment_or_return_end_line($element)

Format comment at end of line or return the end of line associated with the element. In many cases, converters ignore comments and output is better formatted with new lines added independently of the presence of newline or comment in the initial Texinfo line, so most converters are better off not using this method.

$filename = sub $converter->node_information_filename($normalized, $label_element)

Returns the normalized file name corresponding to the $normalized node name and to the $label_element node name element contents.

($normalized_name, $filename) = $converter->normalized_sectioning_command_filename($element)

Returns a normalized name $normalized_name corresponding to a sectioning command tree element $element, expanding the command argument using transliteration and characters protection. Also returns $filename the corresponding filename based on $normalized_name taking into account additional constraint on file names and adding a file extension.

$converter->present_bug_message($message, $element)

Show a bug message using $message text. Use information on $element tree element if given in argument.

$converter->set_global_document_commands($commands_location, $selected_commands)

Set the Texinfo customization options for @-commands. $selected_commands is an array reference containing the @-commands set. $commands_location specifies where in the document the value should be taken from. The possibilities are:

before

Set to the values before document conversion, from defaults and command-line.

last

Set to the last value for the command.

preamble

Set sequentially to the values in the Texinfo preamble.

preamble_or_first

Set to the first value of the command if the first command is not in the Texinfo preamble, else set as with preamble, sequentially to the values in the Texinfo preamble.

Notice that the only effect of this function is to set a customization variable value, no @-command side effects are run, no associated customization variables are set.

For more information on the function used to set the value for each of the command, see Texinfo::Common set_global_document_command.

$table_item_tree = $converter->table_item_content_tree($element)

$element should be an @item or @itemx tree element. Returns a tree in which the @-command in argument of @*table of the $element has been applied to the $element line argument, or undef.

$result = $converter->top_node_filename($document_name)

Returns a file name for the Top node file using either TOP_FILE customization value, or EXTENSION customization value and $document_name.

Finally, there is:

$result = $converter->output_internal_links()

At this level, the method just returns undef. It is used in the HTML output, following the --internal-links option of texi2any specification.


17.6 Texinfo::Convert::Converter SEE ALSO

Texinfo::Common, Texinfo::Convert::Unicode, Texinfo::Report, Texinfo::Translations, Texinfo::Convert::Utils and Texinfo::Parser.


17.7 Texinfo::Convert::Converter AUTHOR

Patrice Dumas, <pertusus@free.fr>