No string concatenation (GNU gettext utilities)

Next: No embedded URLs, Previous: Split at paragraphs, Up: Preparing Translatable Strings [Contents][Index]

4.3.4 No string concatenation ¶

Hardcoded string concatenation is sometimes used to construct English strings:

strcpy (s, "Replace ");
strcat (s, object1);
strcat (s, " with ");
strcat (s, object2);
strcat (s, "?");

In order to present to the translator only entire sentences, and also because in some languages the translator might want to swap the order of object1 and object2, it is necessary to change this to use a format string:

sprintf (s, "Replace %s with %s?", object1, object2);

String concatenation operator ¶

In many programming languages, a particular operator denotes string concatenation at runtime (or possibly at compile time, if the compiler supports that).

In C++, string concatenation of std::string objects is denoted by the ‘+’ operator.
In Python, string concatenation is denoted by the ‘+’ operator.
In Java, string concatenation is denoted by the ‘+’ operator.
In C#, string concatenation is denoted by the ‘+’ operator.
In JavaScript and TypeScript, string concatenation is denoted by the ‘+’ operator.
In Go, string concatenation is denoted by the ‘+’ operator.
In Ruby, string concatenation is denoted by the ‘+’ operator.
In Shell, string concatenation is denoted by mere juxtaposition of strings.
In awk, string concatenation is denoted by mere juxtaposition of strings.
In Lua, string concatenation is denoted by the ‘..’ operator.
In Modula-2, string concatenation is denoted by the ‘+’ operator.
In D, string concatenation is denoted by the ‘~’ operator.
In Smalltalk, string concatenation is denoted by the ‘,’ operator.
In Vala, string concatenation is denoted by the ‘+’ operator.
In Perl, string concatenation is denoted by the ‘.’ operator.
In PHP, string concatenation is denoted by the ‘.’ operator.

So, for example, in Java, you would change

System.out.println("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

System.out.println(
    MessageFormat.format("Replace {0} with {1}?",
                         new Object[] { object1, object2 }));

Similarly, in C#, you would change

Console.WriteLine("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

Console.WriteLine(
    String.Format("Replace {0} with {1}?", object1, object2));

Strings with embedded expressions ¶

In some programming languages, it is possible to have strings with embedded expressions. The expressions can refer to variables of the program. The value of such an expression is converted to a string and inserted in place of the expression; but no formatting function is called.

In Python, f-strings can contain expressions. Such as f"Hello, {name}!".
In C#, since C# 6.0, interpolated strings can contain expressions. Such as $"Hello, {name}!".
In JavaScript, since ES6, and in TypeScript, template literals can contain expressions. Such as `Hello, ${name}!`.
In Ruby, interpolated strings can contain expressions. Such as "Hello, #{name}!".
In Shell language, double-quoted strings can contain references to variables, along with default values and string operations. Such as "Hello, $name!" or "Hello, ${name}!".
In D, interpolation expression sequences can contain expressions. Such as i"Hello, $(name)!".
In Tcl, strings are subject to variable substitution. Such as "Hello, $name!".
In Perl, interpolated strings can contain expressions. Such as "Hello, $name!".
In PHP, string literals are subject to variable parsing. Such as "Hello, $name!".

These cases are effectively string concatenation as well, just with a different syntax.

So, for example, in Python, you would change

print (f'Replace {object1.name} with {object2.name}?')

into a statement involving a format string:

print ('Replace %(name1)s with %(name2)s?'
       % { 'name1': object1.name, 'name2': object2.name })

or equivalently

print ('Replace {name1} with {name2}?'
       .format(name1 = object1.name, name2 = object2.name))

And in JavaScript, you would change

print (`Replace ${object1.name} with ${object2.name}?`)

into a statement involving a format string:

print ('Replace %s with %s?'.format(object1.name, object2.name))

Specifically in JavaScript, an alternative is to use a tagged template literal:

print (tag`Replace ${object1.name} with ${object2.name}?`)

and pass an option ‘--tag=tag:format’ to xgettext.

Format strings with embedded named references ¶

Format strings with embedded named references are different: They are suitable for internationalization, because it is possible to insert a call to the gettext function (that will return a translated format string) before the argument values are inserted in place of the placeholders.

The format string types that allow embedded named references are:

Shell format strings.
In Python, those Python format strings that take a dictionary as argument, and the Python brace format strings.
In Ruby, those Ruby format strings that take a hash table as argument.
In Perl, the Perl brace format strings.

The `<inttypes.h>` macros ¶

A similar case is compile time concatenation of strings. The ISO C 99 include file <inttypes.h> contains a macro PRId64 that can be used as a formatting directive for outputting an ‘int64_t’ integer through printf. It expands to a constant string, usually "d" or "ld" or "lld" or something like this, depending on the platform. Assume you have code like

printf ("The amount is %0" PRId64 "\n", number);

The gettext tools and library have special support for these <inttypes.h> macros. You can therefore simply write

printf (gettext ("The amount is %0" PRId64 "\n"), number);

The PO file will contain the string "The amount is %0<PRId64>\n". The translators will provide a translation containing "%0<PRId64>" as well, and at runtime the gettext function’s result will contain the appropriate constant string, "d" or "ld" or "lld".

This works only for the predefined <inttypes.h> macros. If you have defined your own similar macros, let’s say ‘MYPRId64’, that are not known to xgettext, the solution for this problem is to change the code like this:

char buf1[100];
sprintf (buf1, "%0" MYPRId64, number);
printf (gettext ("The amount is %s\n"), buf1);

This means, you put the platform dependent code in one statement, and the internationalization code in a different statement. Note that a buffer length of 100 is safe, because all available hardware integer types are limited to 128 bits, and to print a 128 bit integer one needs at most 54 characters, regardless whether in decimal, octal or hexadecimal.

4.3.4 No string concatenation ¶

String concatenation operator ¶

Strings with embedded expressions ¶

Format strings with embedded named references ¶

The <inttypes.h> macros ¶

The `<inttypes.h>` macros ¶