Next: Which keywords will xgettext look for?, Up: Perl [Contents][Index]
It is often heard that only Perl can parse Perl. This is not true. Perl cannot be parsed at all, it can only be executed. Perl has various built-in ambiguities that can only be resolved at runtime.
The following example may illustrate one common problem:
print gettext "Hello World!";
Although this example looks like a bullet-proof case of a function invocation, it is not:
open gettext, ">testfile" or die; print gettext "Hello world!"
In this context, the string gettext
looks more like a
file handle. But not necessarily:
use Locale::Messages qw (:libintl_h); open gettext ">testfile" or die; print gettext "Hello world!";
Now, the file is probably syntactically incorrect, provided that the module
Locale::Messages
found first in the Perl include path exports a
function gettext
. But what if the module
Locale::Messages
really looks like this?
use vars qw (*gettext); 1;
In this case, the string gettext
will be interpreted as a file
handle again, and the above example will create a file testfile
and write the string “Hello world!” into it. Even advanced
control flow analysis will not really help:
if (0.5 < rand) { eval "use Sane"; } else { eval "use InSane"; } print gettext "Hello world!";
If the module Sane
exports a function gettext
that does
what we expect, and the module InSane
opens a file for writing
and associates the handle gettext
with this output
stream, we are clueless again about what will happen at runtime. It is
completely unpredictable. The truth is that Perl has so many ways to
fill its symbol table at runtime that it is impossible to interpret a
particular piece of code without executing it.
Of course, xgettext
will not execute your Perl sources while
scanning for translatable strings, but rather use heuristics in order
to guess what you meant.
Another problem is the ambiguity of the slash and the question mark. Their interpretation depends on the context:
# A pattern match. print "OK\n" if /foobar/; # A division. print 1 / 2; # Another pattern match. print "OK\n" if ?foobar?; # Conditional. print $x ? "foo" : "bar";
The slash may either act as the division operator or introduce a
pattern match, whereas the question mark may act as the ternary
conditional operator or as a pattern match, too. Other programming
languages like awk
present similar problems, but the consequences of a
misinterpretation are particularly nasty with Perl sources. In awk
for instance, a statement can never exceed one line and the parser
can recover from a parsing error at the next newline and interpret
the rest of the input stream correctly. Perl is different, as a
pattern match is terminated by the next appearance of the delimiter
(the slash or the question mark) in the input stream, regardless of
the semantic context. If a slash is really a division sign but
mis-interpreted as a pattern match, the rest of the input file is most
probably parsed incorrectly.
There are certain cases, where the ambiguity cannot be resolved at all:
$x = wantarray ? 1 : 0;
The Perl built-in function wantarray
does not accept any arguments.
The Perl parser therefore knows that the question mark does not start
a regular expression but is the ternary conditional operator.
sub wantarrays {} $x = wantarrays ? 1 : 0;
Now the situation is different. The function wantarrays
takes
a variable number of arguments (like any non-prototyped Perl function).
The question mark is now the delimiter of a pattern match, and hence
the piece of code does not compile.
sub wantarrays() {} $x = wantarrays ? 1 : 0;
Now the function is prototyped, Perl knows that it does not accept any
arguments, and the question mark is therefore interpreted as the
ternaray operator again. But that unfortunately outsmarts xgettext
.
The Perl parser in xgettext
cannot know whether a function has
a prototype and what that prototype would look like. It therefore makes
an educated guess. If a function is known to be a Perl built-in and
this function does not accept any arguments, a following question mark
or slash is treated as an operator, otherwise as the delimiter of a
following regular expression. The Perl built-ins that do not accept
arguments are wantarray
, fork
, time
, times
,
getlogin
, getppid
, getpwent
, getgrent
,
gethostent
, getnetent
, getprotoent
, getservent
,
setpwent
, setgrent
, endpwent
, endgrent
,
endhostent
, endnetent
, endprotoent
, and
endservent
.
If you find that xgettext
fails to extract strings from
portions of your sources, you should therefore look out for slashes
and/or question marks preceding these sections. You may have come
across a bug in xgettext
’s Perl parser (and of course you
should report that bug). In the meantime you should consider to
reformulate your code in a manner less challenging to xgettext
.
In particular, if the parser is too dumb to see that a function does not accept arguments, use parentheses:
$x = somefunc() ? 1 : 0; $y = (somefunc) ? 1 : 0;
In fact the Perl parser itself has similar problems and warns you about such constructs.
Next: Which keywords will xgettext look for?, Up: Perl [Contents][Index]