9.3.2 Dynamic Typing In gawk

Hc Svnt Dracones (“Here be dragons”)

The Lenox Globe

Things in gawk can be a little more unexpected. Because gawk allows arrays of arrays, the same dynamic typing can be applied to array elements that have been created but not used.

BEGIN {
    funky(a[1])
    if (A == 0)
        print "<" a[1] ">"
    else
        print a[1][1]
}

function funky(arr)
{
    if (A == 0)
        arr = 1
    else
        arr[1] = 1
}

When run, the results are the same as in the earlier case:

$ gawk -v A=0 -f funky2.awk
-| <>
$ gawk -v A=1 -f funky2.awk
-| 1

The typeof() function provides us a “window” into gawk’s inner workings. Let’s see how using a variable or array element can change its type dynamically. Let’s start with using a variable as a scalar:

BEGIN {
    print typeof(a)         # we don't know what a is yet
    printf("a = %d\n", a)   # use it as a scalar
    print typeof(a)         # now we know it's not an array
}

When run:

$ gawk -f typing1.awk
-| untyped
-| a = 0
-| unassigned

Initially, a is untyped, since we don’t know yet if it’s an array or scalar. After using it in the call to printf(), we know it’s a scalar. However, since it was never given a concrete value (number, string, or regexp), its type is unassigned.

gawk is peculiar in that we can do the same thing, but change a into an array:

BEGIN {
    print typeof(a)               # we don't know what a is yet
    a[1]                          # make a into an array
    print typeof(a[1])            # but we don't know what a[1] is yet
    printf("a[1] = %d\n", a[1])   # use it as a scalar
    print typeof(a[1])            # now we know it's not an array
}

When run:

$ gawk -f typing2.awk
-| untyped
-| untyped
-| a[1] = 0
-| unassigned

Normally, passing a variable that has never been used to a built-in function causes it to become a scalar variable (unassigned). However, isarray() and typeof() are different; they do not change their arguments from untyped to unassigned.

As we saw, this applies to both variables denoted by simple identifiers and array elements that come into existence simply by referencing them:

$ gawk 'BEGIN { print typeof(x) }'
-| untyped
$ gawk 'BEGIN { print typeof(x["foo"]) }'
-| untyped

Note that prior to version 5.2, array elements that come into existence simply by referencing them were different, they were automatically forced to be scalars:

$ gawk-5.1.1 'BEGIN { print typeof(x) }'
-| untyped
$ gawk-5.1.1 'BEGIN { print typeof(x["foo"]) }'
-| unassigned

To sum up, variables and array elements get their nature (array or scalar) “fixed” upon first use. This can lead to some weird cases, and it is best to avoid taking advantage of gawk’s dynamic nature, other than in the standard manner of passing untyped variables and array elements as function parameters.