The World Wide Web Consortium is working on finalizing the specification for XQuery, aiming for a final release late 2002. XQuery is a powerful and convenient language that is designed for processing XML data. That means not just files in XML format, but also other data (including databases) whose structure (nested named trees with attributes) is similar to XML. XQuery is an interesting language with some unusual ideas. This article is intended to give you a hawk's-eye-view of XQuery, introducing the main ideas you should understand before you go deeper or actually try to use it!
The first thing to notice is that in XQuery everything is an expression that evaluates to a value. An XQuery program or script is a just an expression, together with optionally some function and other definitions. So the following:
3+4
is a complete and valid XQuery program that evaluates to
the integer 7
There are no side-effects or updates in the XQuery standard, though those will probably be added at a future date. The standard specifies the result value of an expression or program, but it does not specify how it is to be evaluated. Therefore an implementation has considerable freedom in how it evaluates an XQuery program, and what optimizations it does.
Here is a conditional expression that evaluates to a string:
if (3 < 4) then "yes!" else "no!"
You can define local variable definitions using a let
-expression:
let $x := 5 let $y := 6 return 10*$x+$yThis evaluates to 56.
The primitives data types in XQuery are the same as for XML Schema.
"Hello world!"
.
These are immutable - i.e. you cannot modify a character in a string.
template
)
and a URL, which is used to
represent a tag name like xsl:template
after it has
been namespace-resolved.
XQuery of course also has the necessary data types needed to
represent XML values.
It does this using node values,
of which there are 7 kinds:
element, attribute, namespace, text, comment, processing-instruction,
and document (root) nodes.
These are very similar to the corresponding DOM classes
such as Node
, Element
and so on.
Some XQuery implementations use DOM objects to implement node values,
though implementations may use other representations.
Various standard XQuery functions create or return nodes.
For example the document
function
reads an XML file specified by a URL argument,
and returns a document root node.
(The root element is a child of the root node.)
You can also create new node objects directly in the program. The most convenient way to do that is to use an element constructor expression, which looks just like regular XML data:
<p>See <a href="index.html"><i>here</i></a> for info.</p>You can use
{
curly braces}
to embed XQuery expression inside element constructors:
let $i := 2 return let $r := <em>Value </em> return <p>{$r} of 10*{$i} is {10*$i}.</p>creates:
<p><em>Value </em> of 10*2 is 20.</p>
Popular template processors such as JSP, ASP, and PHP allow you to embed expressions in a programming language into HTML content. XQuery gives you that ability, plus the ability to embed XML/HTML forms inside expressions, and to have them be the value of variables and parameters.
XQuery node values are immutable (you cannot modify a node after it has been created).
We've seen atomic values (numbers, strings, etc), and node values (elements, attributes, etc). These are together known as simple values. XQuery expressions actually evaluate to sequences of simple values. The comma operator can be used to concatenate two values or sequences. For example:
3,4,5is a sequence consisting of 3 integers. Note that a sequence containing just single value is the same as that value by itself, and you cannot nest sequences. To illustrate this, we'll use the
count
function, which takes one argument,
and returns the number of values in that sequence.
So this expression:
let $a := (3,4) let $b := ($a, $a) let $c := 99 let $d := () return (count($a), count($b), count($c), count($d))evaluates to
(2, 4, 1, 0)
, because
$b
is the same as (3,4,3,4)
.
Many of the standard functions for working with nodes
return sequences. For example, the children
function
returns a sequence of the child nodes of the argument:
children(<p>This is <em>very</em> cool.</p>)returns this sequence of 3 values:
"This is ", <em>very</em>, " cool."
XQuery borrows path expressions from XPath. In fact, XQuery can be viewed as a generalization of XPath: Except for some obscure forms (mostly unusual "axis specifiers"), all XPath expressions are also XQuery expressions. For this reason the XPath specification is also being revised by the XQuery committee, with the plan that XQuery 1.0 and XPath 2.0 will be released about the same time.
The following simple example assumes an XML file "mybook.xml"
whose root element is a <book>
,
and it contains some <chapter>
children:
let $book := document("mybook.xml")/book return $book/chapter
The document
function returns the root node of a document.
The /book
expression selects the child elements of the root
that are named book
, so $book
gets set to
the single root element.
The $book/chapter
selects the child elements of the
top-level book
elements, which results in a sequence
of the second-level chapter
nodes, in document order.
The next example includes a predicate:
$book//para[@class="warning"]
The double slash is a convenience syntax to select all
descendants (rather than just children) of $book
,
selecting only <para>
element nodes
that have an attribute node named class
whose
value is "warning"
One difference to note between XPath and XQuery is that XPath expressions may return a node set, whereas the same XQuery expression will return a node sequence. For compatibility, these sequences will be in document order and with duplicates removed, which makes them equivalent to sets,
By the way: XPath expressions are mostly used as patterns in XSLT stylesheets. XSLT (XSL Transformation, where XSL stands for XML Stylesheet Language) is a rule-based language for transforming an input XML document into a result XML document. XSLT is very useful for expressing very simple transformations, but more complicated stylesheets (especially anything with non-trivial logic or programming) can often be written more compactly and readably using XQuery.
A for
expression lets you "loop" over the
elements of a sequence:
for $x in (1 to 3) return ($x,10+$x)
The for
expression first evaluates the
expression following the in
.
Then for each value of the resulting sequence,
the variable (in this case $x
) is bound to the
value, and the return
expression evaluated
using that variable binding.
The value of the entire for
expression is the
concatenation of all values of the return
expression, in order.
So the example evaluates to this 6-element sequence:
1,11,2,12,3,13
Here is a more useful example.
Assume again that mybook.xml
is a <book>
that contains
some <chapter>
elements.
Each <chapter>
has a <title>
.
The following will create a simple web page that just lists the titles:
<html>{ let $book := document("mybook.xml")/book for $ch in $book/chapter return <h2>{$ch/title)</h2> }</html>
The term "FLWR expressions" includes both
Below is an example illustrating the
This is essentially a join of two tables,
as commonly performed using relational databases.
An important goal for XQuery is that it should be
usable as a query language for "XML databases".
Compare the corresponding SQL statement:
XQuery wouldn't be much of a programming language without
user-defined functions. Such function definitions appear
in the query prologue of an XQuery program.
It is worth noting that that function parameters and
function results can be primitive values, nodes, or sequences of either.
Below we define a recursive utility function.
It returns all the descendant nodes of the argument,
including the argument node itself.
It does a depth-first traversal of the argument,
returning the argument, and then looping over the argument node's
children, recursively calling itself for each child.
If you want to sort a sequence you can use a
The
Path expressions also use and set the context.
For example in
XQuery is a strongly typed programming language.
Like Java, C#, and other languages, it is a mix
of static typing (type consistency checked at compile-time)
and dynamic typing (run-time type tests).
However, the types in XQuery are different from the classes
familiar from object-oriented programming.
Instead, it has types to match XQuery's data model,
and allows you to import types form XML Schema.
This invokes the
The primary XQuery resource is
www.w3.org/XML/Query
at the World Wide Web Consortium.
This has links to the draft standards, mailing lists, and implementations.
The main documents are:
My article Generating XML and HTML using XQuery
(www.gnu.org/software/qexo/XQ-Gen-XML.html)
explains further how to generate XML documents and HTML web pages using XQuery.
Obviously, there is no complete standards-conforming implementations yet,
but the XQuery site lists known implementations, some of which
have executable demos.
My own Qexo implementation.
is noteworthy in that it it compiles XQuery
programs on-the-fly directly to Java bytecodes, and it is open-source.
I welcome you to experiment with it. But in any case, I do recommend
considering XQuery when you need a powerful and convenient
tool for analyzing or generating XML.
Copyright 2002 (C) Per Bothner Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
version 1.1.for
and let
expressions. The acronym FLWR refers to
the fact that it consists of one or more for
and/or let
clauses,
an optional where
clause,
and a result
clause.
A where
clause causes the result
clause to be evaluated only when the where
clause. This example has a nested loop, allowing us to combine
two sequences, one of customer elements , and the other of order elements.
We want to find the name(s) of customers who have ordered
the part whose part_id
is "xx"
.
for $c in customers
for $o in orders
where $c.cust_id=$o.cust_id and $o.part_id="xx"
return $c.name
select customers.name
from customers, orders
where customers.cust_id=orders.cust_id
and orders.part_id="xx"
Functions
define function descendant-or-self ($x)
{
$x,
for $y in children($x)
return descendant-or-self($y)
}
descendant-or-self(<a>X<b>Y</b></a>)
evaluates to this sequence of length 4:
<a>X<b>Y</b></a>; "X"; <b>Y</b>; "Y"
Sorting and context
sortby
expression. For example to sort a sequence of books
in order of author name you can do:
$books sortby (author/name)
sortby
takes an input sequence (in this
case $books
) and one or more ordering expressions.
During sorting the implementation needs to compare two values
from the input sequence to determinate which comes first.
It does that by evaluating the ordering expression(s) in the
context of a value from the input sequence.
So the path expression author/name
is evaluated
many times, each time relative to a different book as
the context (or current) item.
author/name
the name
children that are returned are those of the context item,
which is an author
item.
Type specification
if ($child instance of element section)
then process-section($child)
else ( ) {--nothing--}
process-section
function
if the value of $child
is an element whose
tag name is section
.
XQuery has a convenient typeswitch
shorthand for matching a value against a number of types.
Here is an example to convert a set of tag names to
a different set. It is a simple example of the kind of transformations
that XSLT does well.
define function convert($x) {
typeswitch ($x)
case element para return <p>{process-children($x)}</p>
case element emph return <em>{process-children($x)}</em>
default return process-children($x)
}
define function process-children($x) {
for $ch in children($x) return convert($ch)
}
Resources
<per@bothner.com>
.