Kawa: Processes

Processes

A process is a native (operating-system-level) application or program that runs separately from the current virtual machine.

Many programming languages have facilities to allow access to system processes (commands). (For example Java has java.lang.Process and java.lang.ProcessBuilder.) These facilities let you send data to the standard input, extract the resulting output, look at the return code, and sometimes even pipe commands together. However, this is rarely as easy as it is using the old Bourne shell; for example command substitution is awkward. Kawa’s solution is based on these two ideas:

A “process expression” (typically a function call) evaluates to a LProcess value, which provides access to a Unix-style (or Windows) process.
In a context requiring a string (or a bytevector), an LProcess is automatically converted to a string (or bytevector) comprising the standard output from the process.

Creating a process

The most flexible way to start a process is with either the run-process procedure or the &`{command} syntax for process literals.

Procedure: run-process process-keyword-argument^* command

Creates a process object, specifically a gnu.kawa.functions.LProcess object. A process-keyword-argument can be used to set various options, as discussed below.

The command is the process command-line (name and arguments). It can be an array of strings, in which case those are used as the command arguments directly:
(run-process ["ls" "-l"])
The command can also be a single string, which is split (tokenized) into command arguments separated by whitespace. Quotation groups words together just like traditional shells:
(run-process "cmd a\"b 'c\"d k'l m\"n'o")
   ⇒ (run-process ["cmd"   "ab 'cd"   "k'l m\"no"])
The syntax shorthand &`{command} or &sh{command} (discussed below) is usually more convenient.

process-keyword-argument ::=
    process-redirect-argument
  | process-environment-argument
  | process-misc-argument

We discuss process-redirect-argument and process-environment-argument later. The process-misc-argument options are just the following:

shell: shell

Currently, shell must be one of #f (which is ignored) or #t. The latter means to use an external shell to tokenize the command. I.e. the following are equivalent:

(run-process shell: #t "command")
(run-process ["/bin/sh" "-c" "command"])

directory: dir

Change the working directory of the new process to dir.

Process literals

A simple process literal is a kind of named literal that uses the backtick character (`) as the cname. For example:

&`{date --utc}

This is equivalent to:

(run-process "date --utc")

In general the following are roughly equivalent (using string quasi-literals):

&`[args...]{command}
(run-process args... &{command})

The reason for the “roughly” is if command contains escaped sub-expressions; in that case &` may process the resulting values differently from plain string-substitution, as discussed below.

If you use &sh instead of &` then a shell is used:

&sh{rm *.class}

which is equivalent to:

&`{/bin/sh -c "rm *.class"}

In general, the following are equivalent:

&sh[args...]{command}
&`[shell: #t args...]{command}

Process values and process output

The value returned from a call to run-process or a process literal is an instance of gnu.kawa.functions.LProcess. This class extends java.lang.Process, so you can treat it as any other Process object.

#|kawa:1|# (define p1 &`{date --utc})
#|kawa:2|# (p1:toString)
gnu.kawa.functions.LProcess@377dca04
#|kawa:3|# (write p1)
gnu.kawa.functions.LProcess@377dca04

What makes an LProcess interesting is that it is also a blob, which is automatically converted to a string (or bytevector) in a context that requires it. The contents of the blob comes from the standard output of the process. The blob is evaluated lazily, so data it is only collected when requested.

#|kawa:4|# (define s1 ::string p1)
#|kawa:5|# (write s1)
"Wed Jan  1 01:18:21 UTC 2014\n"
#|kawa:6|# (define b1 ::bytevector p1)
(write b1)
#u8(87 101 100 32 74 97 110 ... 52 10)

The display procedure prints it in “human” form, as a string:

#|kawa:7|# (display p1)
Wed Jan  1 01:18:21 UTC 2014

This is also the default REPL formatting:

#|kawa:8|# &`{date --utc}
Wed Jan  1 01:18:22 UTC 2014

When you type a command to a shell, its output goes to the console, Similarly, in a REPL the output from the process is copied to the console output - which can sometimes by optimized by letting the process inherit its standard output from the Kawa process.

Substitution and tokenization

To substitute the variable or the result of an expression in the command line use the usual syntax for quasi literals:

(define filename (make-temporary-file))
&sh{run-experiment >&[filename]}

Since a process is convertible a string, we need no special syntax for command substitution:

`{echo The directory is: &[&`{pwd}]}

or equivalently:

`{echo The directory is: &`{pwd}}

Things get more interesting when considering the interaction between substitution and tokenization. This is not simple string interpolation. For example, if an interpolated value contains a quote character, we want to treat it as a literal quote, rather than a token delimiter. This matches the behavior of traditional shells. There are multiple cases, depending on whether the interpolation result is a string or a vector/list, and depending on whether the interpolation is inside quotes.

If the value is a string, and we’re not inside quotes, then all non-whitespace characters (including quotes) are literal, but whitespace still separates tokens:
```
(define v1 "a b'c ")
&`{cmd x y&[v1]z}   ⇒  (run-process ["cmd" "x" "ya" "b'c" "z"])
```
If the value is a string, and we are inside single quotes, all characters (including whitespace) are literal.
```
&`{cmd 'x y&[v1]z'}   ⇒  (run-process ["cmd" "x ya b'c z"])
```
Double quotes work the same except that newline is an argument separator. This is useful when you have one filename per line, and the filenames may contain spaces, as in the output from find:
```
&`{ls -l "&`{find . -name '*.pdf'}"}
```
This solves a problem that is quite painful with traditional shells.
If the value is a vector or list (of strings), and we’re not inside quotes, then each element of the array becomes its own argument, as-is:
```
(define v2 ["a b" "c\"d"])
&`{cmd &[v2]}  ⇒  (run-process ["cmd" "a b" "c\"d"])
```
However, if the enclosed expression is adjacent to non-space non-quote characters, those are prepended to the first element, or appended to the last element, respectively.
```
&`{cmd x&[v2]y}   ⇒  (run-process ["cmd" "xa b" "c\"dy"])
&`{cmd x&[[]]y}   ⇒  (run-process ["cmd" "xy"])
```
This behavior is similar to how shells handle "$@" (or "${name[@]}" for general arrays), though in Kawa you would leave off the quotes.

Note the equivalence:
```
&`{&[array]}   ⇒  (run-process array)
```
If the value is a vector or list (of strings), and we are inside quotes, it is equivalent to interpolating a single string resulting from concatenating the elements separated by a space:
```
&`{cmd "&[v2]"}
 ⇒  (run-process ["cmd" "a b c\"d"])
```
This behavior is similar to how shells handle "$*" (or "${name[*]}" for general arrays).
If the value is the result of a call to unescaped-data then it is parsed as if it were literal. For example a quote in the unescaped data may match a quote in the literal:
```
(define vu (unescaped-data "b ' c d '"))
&`{cmd 'a &[vu]z'}   ⇒  (run-process ["cmd" "a b " "c" "d" "z"])
```
If we’re using a shell to tokenize the command, then we add quotes or backslashes as needed so that the shell will tokenize as described above:
```
(define authors ["O'Conner" "de Beauvoir"])
&sh{list-books &[authors]}
```
The command passed to the shell is:
```
list-books 'O'\''Conner' 'de Beauvoir
```
Having quoting be handled by the $construct$:sh implementation automatically eliminates common code injection problems.

Smart tokenization only happens when using the quasi-literal forms such as &`{command}. You can of course use string templates with run-process:

(run-process &{echo The directory is: &`{pwd}})

However, in that case there is no smart tokenization: The template is evaluated to a string, and then the resulting string is tokenized, with no knowledge of where expressions were substituted.

Input/output redirection

You can use various keyword arguments to specify standard input, output, and error streams. For example to lower-case the text in in.txt, writing the result to out.txt, you can do:

&`[in-from: "in.txt" out-to: "out.txt"]{tr A-Z a-z}

or:

(run-process in-from: "in.txt" out-to: "out.txt" "tr A-Z a-z")

A process-redirect-argument can be one of the following:

in: value

The value is evaluated, converted to a string (as if using display), and copied to the input file of the process. The following are equivalent:

&`[in: "text\n"]{command}
&`[in: &`{echo "text"}]{command}

You can pipe the output from command1 to the input of command2 as follows:

&`[in: &`{command1}]{command2}

in-from: path

The process reads its input from the specified path, which can be any value coercible to a filepath.

out-to: path

The process writes its output to the specified path.

err-to: path

Similarly for the error stream.

out-append-to: path

err-append-to: path

Similar to out-to and err-to, but append to the file specified by path, instead of replacing it.

in-from: ’pipe

out-to: ’pipe

err-to: ’pipe

Does not set up redirection. Instead, the specified stream is available using the methods getOutputStream, getInputStream, or getErrorStream, respectively, on the resulting Process object, just like Java’s ProcessBuilder.Redirect.PIPE.

in-from: ’inherit

out-to: ’inherit

err-to: ’inherit

Inherits the standard input, output, or error stream from the current JVM process.

out-to: port

err-to: port

Redirects the standard output or error of the process to the specified port.

out-to: ’current

err-to: ’current

Same as out-to: (current-output-port), or err-to: (current-error-port), respectively.

in-from: port

in-from: ’current

Re-directs standard input to read from the port (or (current-input-port)). It is unspecified how much is read from the port. (The implementation is to use a thread that reads from the port, and sends it to the process, so it might read to the end of the port, even if the process doesn’t read it all.)

err-to: ’out

Redirect the standard error of the process to be merged with the standard output.

The default for the error stream (if neither err-to or err-append-to is specified) is equivalent to err-to: 'current.

Note: Writing to a port is implemented by copying the output or error stream of the process. This is done in a thread, which means we don’t have any guarantees when the copying is finished. (In the future we might change process-exit-wait (discussed later) wait for not only the process to finish, but also for these helper threads to finish.)

A here document is a form a literal string, typically multi-line, and commonly used in shells for the standard input of a process. You can use string literals or string quasi-literals for this. For example, this passes the string "line1\nline2\nline3\n" to the standard input of command:

(run-process [in: &{
    &|line1
    &|line2
    &|line3
    }] "command")

Note the use of &| to mark the end of ignored indentation.

Pipe-lines

Piping the output of one process as the input of another is in principle easy - just use the in: process argument. However, writing a multi-stage pipe-line quickly gets ugly:

&`[in: &`[in: "My text\n"]{tr a-z A-Z}]{wc}

The convenience macro pipe-process makes this much nicer:

(pipe-process
  "My text\n"
  &`{tr a-z A-Z}
  &`{wc})

Syntax: pipe-process input process^*

All of the process expressions must be run-process forms, or equivalent &`{command} forms. The result of evaluating input becomes the input to the first process; the output from the first process becomes the input to the second process, and so on. The result of whole pipe-process expression is that of the last process.

Copying the output of one process to the input of the next is optimized: it uses a copying loop in a separate thread. Thus you can safely pipe long-running processes that produce huge output. This isn’t quite as efficient as using an operating system pipe, but is portable and works pretty well.

Setting the process environment

By default the new process inherits the system environment of the current (JVM) process as returned by System.getenv(), but you can override it. A process-environment-argument can be one of the following:

env-name: value

In the process environment, set the "name" to the specified value. For example:

&`[env-CLASSPATH: ".:classes"]{java MyClass}

NAME: value

Same as using the env-NAME option above, but only if the NAME is uppercase (i.e. if uppercasing NAME yields the same string). For example the previous example could be written:

&`[CLASSPATH: ".:classes"]{java MyClass}

environment: env

The env is evaluated and must yield a HashMap. This map is used as the system environment of the process.

Waiting for process exit

When a process finishes, it returns an integer exit code. The code is traditionally 0 on successful completion, while a non-zero code indicates some kind of failure or error.

Procedure: process-exit-wait process

The process expression must evaluate to a process (any java.lang.Process object). This procedure waits for the process to finish, and then returns the exit code as an int.
(process-exit-wait (run-process "echo foo")) ⇒ 0

Procedure: process-exit-ok? process

Calls process-exit-wait, and then returns #false if the process exited it 0, and returns #true otherwise.

This is useful for emulating the way traditional shell do logic control flow operations based on the exit code. For example in sh you might write:
if grep Version Makefile >/dev/null
then echo found Version
else echo no Version
fi
The equivalent in Kawa:
(if (process-exit-ok? &`{grep Version Makefile})
  &`{echo found}
  &`{echo not found})
Strictly speaking these are not quite the same, since the Kawa version silently throws away the output from grep (because no-one has asked for it). To match the output from the sh, you can use out-to: 'inherit:
(if (process-exit-ok? &`[out-to: 'inherit]{grep Version Makefile})
  &`{echo found}
  &`{echo not found})

Exiting the current process

Procedure: exit [code]

Exits the Kawa interpreter, and ends the Java session. Returns the value of code to the operating system: The code must be integer, or the special values #f (equivalent to -1), or #t (equivalent to 0). If code is not specified, zero is returned. The code is a status code; by convention a non-zero value indicates a non-standard (error) return.

Before exiting, finally-handlers (as in try-finally, or the after procedure of dynamic-wind) are executed, but only in the current thread, and only if the current thread was started normally. (Specifically if we’re inside an ExitCalled block with non-zero nesting - see gnu.kawa.util.ExitCalled.) Also, JVM shutdown hooks are executed - which includes flushing buffers of output ports. (Specifically Writer objects registered with the WriterManager.)

Procedure: emergency-exit [code]

Exits the Kawa interpreter, and ends the Java session. Communicates an exit value in the same manner as exit. Unlike exit, neither finally-handlers nor shutdown hooks are executed.

Deprecated functions

Procedure: make-process command envp

Creates a <java.lang.Process> object, using the specified command and envp. The command is converted to an array of Java strings (that is an object that has type <java.lang.String[]>. It can be a Scheme vector or list (whose elements should be Java strings or Scheme strings); a Java array of Java strings; or a Scheme string. In the latter case, the command is converted using command-parse. The envp is process environment; it should be either a Java array of Java strings, or the special #!null value.

Except for the representation of envp, this is similar to:
(run-process environment: envp command)

Procedure: system command

Runs the specified command, and waits for it to finish. Returns the return code from the command. The return code is an integer, where 0 conventionally means successful completion. The command can be any of the types handled by make-process.

Equivalent to:
(process-exit-wait (make-process command #!null))

Variable: command-parse

The value of this variable should be a one-argument procedure. It is used to convert a command from a Scheme string to a Java array of the constituent "words". The default binding, on Unix-like systems, returns a new command to invoke "/bin/sh" "-c" concatenated with the command string; on non-Unix-systems, it is bound to tokenize-string-to-string-array.

Procedure: tokenize-string-to-string-array command

Uses a java.util.StringTokenizer to parse the command string into an array of words. This splits the command using spaces to delimit words; there is no special processing for quotes or other special characters. (This is the same as what java.lang.Runtime.exec(String) does.)