14 Texinfo::Convert::Unicode


14.1 Texinfo::Convert::Unicode NAME

Texinfo::Convert::Unicode - Representation as Unicode characters


14.2 Texinfo::Convert::Unicode SYNOPSIS

  use Texinfo::Convert::Unicode qw(unicode_accent encoded_accents
                                   unicode_text);
  use Texinfo::Convert::Text qw(convert_to_text);

  my ($contents_element, $stack)
      = Texinfo::Convert::Utils::find_innermost_accent_contents($accent);

  my $formatted_accents = encoded_accents ($converter,
                 convert_to_text($contents_element), $stack, $encoding,
                        \&Texinfo::Text::ascii_accent_fallback);

  my $accent_text = unicode_accent('e', $accent_command);

14.3 Texinfo::Convert::Unicode NOTES

The Texinfo Perl module main purpose is to be used in texi2any to convert Texinfo to other formats. There is no promise of API stability.


14.4 Texinfo::Convert::Unicode DESCRIPTION

Texinfo::Convert::Unicode provides methods dealing with Unicode representation and conversion of Unicode code points, to be used in Texinfo converters.

When an encoding supported in Texinfo is given as argument of a method of the module, the accented letters or characters returned by the method should only be represented by Unicode code points if it is known that Perl should manage to convert the Unicode code points to encoded characters in the encoding character set. Note that the actual conversion is done by Perl, not by the module.


14.5 Texinfo::Convert::Unicode METHODS

$result = brace_no_arg_command($command_name, $encoding)

Return the Unicode representation of a command with brace and no argument $command_name (like @bullet{}, @aa{} or @guilsinglleft{}), or undef if the Unicode representation cannot be converted to encoding $encoding.

$possible_conversion = check_unicode_point_conversion($arg, $output_debug)

Check that it is possible to output actual UTF-8 binary bytes corresponding to the Unicode code point string $arg (such as 201D). Perl gives a warning and will not output UTF-8 for Unicode non-characters such as U+10FFFF. If the optional $output_debug argument is set, a debugging output warning is emitted if the test of the conversion failed. Returns 1 if the conversion is possible and can be attempted, 0 otherwise.

$result = encoded_accents($converter, $text, $stack, $encoding, $format_accent, $set_case)

$encoding is the encoding the accented characters should be encoded to. If $encoding not set, $result is set to undef. Nested accents and their content are passed with $text and $stack. $text is the text appearing within nested accent commands. $stack is an array reference holding the nested accents texinfo tree elements. In general, $text is the formatted contents and $stack the stack returned by Texinfo::Convert::Utils::find_innermost_accent_contents. The function tries to convert as much as possible the accents to $encoding starting from the innermost accent.

$format_accent is a function reference that is used to format the accent commands if there is no encoded character available at some point of the conversion of the $stack. $converter is a converter object optionaly used by $format_accent. It may be undef if there is no need of converter object in $format_accent.

The $set_case argument is optional. If $set_case is positive, the result is upper-cased, while if it is negative, the result is lower-cased.

$width = string_width($string)

Return the string width, taking into account the fact that some characters have a zero width (like composing accents) while some have a width of 2 (most chinese characters, for example).

$result = unicode_accent($text, $accent_command)

$text is the text appearing within an accent command. $accent_command should be a Texinfo tree element corresponding to an accent command taking an argument. The function returns the Unicode representation of the accented character.

$is_decoded = unicode_point_decoded_in_encoding($encoding, $unicode_point)

Return true if the $unicode_point will be encoded in the encoding $encoding. The $unicode_point should be specified as a four letter string describing an hexadecimal number with letters in upper case (such as 201D). Tables are used to determine if the $unicode_point will be encoded, when the encoding does not cover the whole Unicode range.

If the encoding is not supported in Texinfo, the result will always be false.

$result = unicode_text($text, $in_code)

Return $text with dashes and quotes corresponding, for example to --- or ', represented as Unicode code points. If $in_code is set, the text is considered to be in code style.


14.6 Texinfo::Convert::Unicode AUTHOR

Patrice Dumas, <pertusus@free.fr>