Next: , Up: Preparing Rules for XML Internationalization   [Contents][Index]


16.1.6.1 Specifying ITS Rules

Marking translatable strings in an XML file is done through a separate "rule" file, making use of the Internationalization Tag Set standard (ITS, https://www.w3.org/TR/its20/). The currently supported ITS data categories are: ‘Translate’, ‘Localization Note’, ‘Elements Within Text’, and ‘Preserve Space’. In addition to them, xgettext also recognizes the following extended data categories:

Context

This data category associates msgctxt to the extracted text. In the global rule, the contextRule element contains the following:

  • A required selector attribute. It contains an absolute selector that selects the nodes to which this rule applies.
  • A required contextPointer attribute that contains a relative selector pointing to a node that holds the msgctxt value.
  • An optional textPointer attribute that contains a relative selector pointing to a node that holds the msgid value.
Extended Preserve Space

This data category extends the standard ‘Preserve Space’ data category with the additional values ‘trim’ and ‘paragraph’. ‘trim’ means to remove the leading and trailing whitespaces of the content, but not to normalize whitespaces in the middle. ‘paragraph’ means to normalize the content but keep the paragraph boundaries. In the global rule, the preserveSpaceRule element contains the following:

  • A required selector attribute. It contains an absolute selector that selects the nodes to which this rule applies.
  • A required space attribute with the value default, preserve, trim, or paragraph.
Escape Special Characters

This data category indicates whether the special XML characters (<, >, &, ") are escaped with entity references. In the global rule, the escapeRule element contains the following:

  • A required selector attribute. It contains an absolute selector that selects the nodes to which this rule applies.
  • A required escape attribute with the value yes or no.
  • An optional unescape-if attribute with the value xml, xhtml, html, or no.

The default values, escape="no" and unescape-if="no", should be good for most XML file types. A rule with escape="no", that was necessary with GNU gettext versions before 0.23, is now redundant.

The unescape-if attribute is useful for XML file types which present messages with embedded XML elements to the translator. Such file types are for example DocBook or XHTML. If unescape-if="xml" is specified and the translation of a message looks like valid XML, the usual escaping of <, >, and character references is omitted. The resulting XML document then is likely what the translator intended. However, if the translator did not merely copy the XML markup from the message to the translation, but added or removed markup, the resulting XML document may be invalid. It is therefore useful if, after invoking msgfmt, you check the resulting XML document against the appropriate XML schema or DTD.

Similarly, if unescape-if="xhtml" is specified and the translation looks like valid XHTML, the usual escaping is omitted. And likewise for unescape-if="html".

All those extended data categories can only be expressed with global rules, and the rule elements have to have the https://www.gnu.org/s/gettext/ns/its/extensions/1.0 namespace.

Given the following XML document in a file messages.xml:

<?xml version="1.0"?>
<messages>
  <message>
    <p>A translatable string</p>
  </message>
  <message>
    <p translatable="no">A non-translatable string</p>
  </message>
</messages>

To extract the first text content ("A translatable string"), but not the second ("A non-translatable string"), the following ITS rules can be used:

<?xml version="1.0"?>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
  <its:translateRule selector="/messages" translate="no"/>
  <its:translateRule selector="//message/p" translate="yes"/>

  <!-- If 'p' has an attribute 'translatable' with the value 'no', then
       the content is not translatable.  -->
  <its:translateRule selector="//message/p[@translatable = 'no']"
    translate="no"/>
</its:rules>

ITS rules files must have the .its file extension and obey the XML schema version 1.0 encoded by its.xsd10 or the XML schema version 1.1 encoded by its.xsd11 and its auxiliary schema its-extensions.xsd.


Next: Specifying where to find the ITS Rules, Up: Preparing Rules for XML Internationalization   [Contents][Index]