Setting up a web service that allows user interaction is more difficult and
shows us the limits of network access in gawk
. In this section,
we develop a main program (a BEGIN
pattern and its action)
that will become the core of event-driven execution controlled by a
graphical user interface (GUI).
Each HTTP event that the user triggers by some action within the browser
is received in this central procedure. Parameters and menu choices are
extracted from this request, and an appropriate measure is taken according to
the user’s choice:
BEGIN { if (MyHost == "") { "uname -n" | getline MyHost close("uname -n") } if (MyPort == 0) MyPort = 8080 HttpService = "/inet/tcp/" MyPort "/0/0" MyPrefix = "http://" MyHost ":" MyPort SetUpServer() while ("awk" != "complex") { # header lines are terminated this way RS = ORS = "\r\n" Status = 200 # this means OK Reason = "OK" Header = TopHeader Document = TopDoc Footer = TopFooter if (GETARG["Method"] == "GET") { HandleGET() } else if (GETARG["Method"] == "HEAD") { # not yet implemented } else if (GETARG["Method"] != "") { print "bad method", GETARG["Method"] } Prompt = Header Document Footer print "HTTP/1.0", Status, Reason |& HttpService print "Connection: Close" |& HttpService print "Pragma: no-cache" |& HttpService len = length(Prompt) + length(ORS) print "Content-length:", len |& HttpService print ORS Prompt |& HttpService # ignore all the header lines while ((HttpService |& getline) > 0) ; # stop talking to this client close(HttpService) # wait for new client request HttpService |& getline # do some logging print systime(), strftime(), $0 # read request parameters CGI_setup($1, $2, $3) } }
This web server presents menu choices in the form of HTML links. Therefore, it has to tell the browser the name of the host it is residing on. When starting the server, the user may supply the name of the host from the command line with ‘gawk -v MyHost="Rumpelstilzchen"’. If the user does not do this, the server looks up the name of the host it is running on for later use as a web address in HTML documents. The same applies to the port number. These values are inserted later into the HTML content of the web pages to refer to the home system.
Each server that is built around this core has to initialize some
application-dependent variables (such as the default home page) in a function
SetUpServer()
, which is called immediately before entering the
infinite loop of the server. For now, we will write an instance that
initiates a trivial interaction. With this home page, the client user
can click on two possible choices, and receive the current date either
in human-readable format or in seconds since 1970:
function SetUpServer() { TopHeader = "<HTML><HEAD>" TopHeader = TopHeader \ "<title>My name is GAWK, GNU AWK</title></HEAD>" TopDoc = "<BODY><h2>\ Do you prefer your date <A HREF=" MyPrefix \ "/human>human</A> or \ <A HREF=" MyPrefix "/POSIX>POSIXed</A>?</h2>" ORS ORS TopFooter = "</BODY></HTML>" }
On the first run through the main loop, the default line terminators are
set and the default home page is copied to the actual home page. Since this
is the first run, GETARG["Method"]
is not initialized yet, hence the
case selection over the method does nothing. Now that the home page is
initialized, the server can start communicating to a client browser.
It does so by printing the HTTP header into the network connection (‘print … |& HttpService’). This command blocks execution of the server script until a client connects.
If you compare this server script with the primitive one we wrote before, you will notice two additional lines in the header. The first instructs the browser to close the connection after each request. The second tells the browser that it should never try to remember earlier requests that had identical web addresses (no caching). Otherwise, it could happen that the browser retrieves the time of day in the previous example just once, and later it takes the web page from the cache, always displaying the same time of day although time advances each second.
Having supplied the initial home page to the browser with a valid document
stored in the parameter Prompt
, it closes the connection and waits
for the next request. When the request comes, a log line is printed that
allows us to see which request the server receives. The final step in the
loop is to call the function CGI_setup()
, which reads all the lines
of the request (coming from the browser), processes them, and stores the
transmitted parameters in the array PARAM
. The complete
text of these application-independent functions can be found in
A Simple CGI Library.
For now, we use a simplified version of CGI_setup()
:
function CGI_setup( method, uri, version, i) { delete GETARG; delete MENU; delete PARAM GETARG["Method"] = $1 GETARG["URI"] = $2 GETARG["Version"] = $3 i = index($2, "?") # is there a "?" indicating a CGI request?
if (i > 0) { split(substr($2, 1, i-1), MENU, "[/:]") split(substr($2, i+1), PARAM, "&") for (i in PARAM) { j = index(PARAM[i], "=") GETARG[substr(PARAM[i], 1, j-1)] = \ substr(PARAM[i], j+1) } } else { # there is no "?", no need for splitting PARAMs split($2, MENU, "[/:]") }
}
At first, the function clears all variables used for
global storage of request parameters. The rest of the function serves
the purpose of filling the global parameters with the extracted new values.
To accomplish this, the name of the requested resource is split into
parts and stored for later evaluation. If the request contains a ‘?’,
then the request has CGI variables seamlessly appended to the web address.
Everything in front of the ‘?’ is split up into menu items, and
everything behind the ‘?’ is a list of ‘variable=value’ pairs
(separated by ‘&’) that also need splitting. This way, CGI variables are
isolated and stored. This procedure lacks recognition of special characters
that are transmitted in coded form10. Here, any
optional request header and body parts are ignored. We do not need
header parameters and the request body. However, when refining our approach or
working with the POST
and PUT
methods, reading the header
and body
becomes inevitable. Header parameters should then be stored in a global
array as well as the body.
On each subsequent run through the main loop, one request from a browser is
received, evaluated, and answered according to the user’s choice. This can be
done by letting the value of the HTTP method guide the main loop into
execution of the procedure HandleGET()
, which evaluates the user’s
choice. In this case, we have only one hierarchical level of menus,
but in the general case,
menus are nested.
The menu choices at each level are
separated by ‘/’, just as in file names. Notice how simple it is to
construct menus of arbitrary depth:
function HandleGET() { if ( MENU[2] == "human") { Footer = strftime() TopFooter } else if (MENU[2] == "POSIX") { Footer = systime() TopFooter } }
The disadvantage of this approach is that our server is slow and can
handle only one request at a time. Its main advantage, however, is that
the server
consists of just one gawk
program. No need for installing an
httpd
, and no need for static separate HTML files, CGI scripts, or
root
privileges. This is rapid prototyping.
This program can be started on the same host that runs your browser.
Then let your browser point to http://localhost:8080.
It is also possible to include images into the HTML pages.
Most browsers support the not very well-known
.xbm format,
which may contain only
monochrome pictures but is an ASCII format. Binary images are possible but
not so easy to handle. Another way of including images is to generate them
with a tool such as GNUPlot,
by calling the tool with the system()
function or through a pipe.