XML is an extremely versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. Because query languages have traditionally been designed for specific kinds of data, most existing proposals for XML query languages are robust for particular types of data sources but weak for other types. This specification describes a new query language called XQuery, which is designed to be broadly applicable across all types of XML data sources.

1 Introduction

As increasing amounts of information are stored, exchanged, and presented using XML, the ability to intelligently query XML data sources becomes increasingly important. One of the great strengths of XML is its flexibility in representing many different kinds of information from diverse sources. To exploit this flexibility, an XML query language must provide features for retrieving and interpreting information from these diverse sources.

XQuery is designed to meet the requirements identified by the W3C XML Query Working Group [XML Query 1.0 Requirements]. It is designed to be a small, easily implementable language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents. The Query Working Group has identified a requirement for both a human-readable query syntax and an XML-based query syntax. XQuery is designed to meet the first of these requirements. For an alternative, XML-based syntax for the XQuery semantics, see [XQueryX 1.0]

XQuery is derived from an XML query language called Quilt [Quilt], which in turn borrowed features from several other languages. From XPath [XPath 1.0] and XQL [XQL] it took a path expression syntax suitable for hierarchical documents. From XML-QL [XML-QL] it took the notion of binding variables and then using the bound variables to create new structures. From SQL [SQL] it took the idea of a series of clauses based on keywords that provide a pattern for restructuring data (the SELECT-FROM-WHERE pattern in SQL). From OQL [ODMG] it took the notion of a functional language composed of several different kinds of expressions that can be nested with full generality. Quilt was also influenced by other XML query languages such as Lorel [Lorel] and YATL [YATL].

Important issues remain open in the design of XQuery. Some of these issues deal with relationships between XQuery and other XML activities, for example:

For more details on open issues, see Appendix C.

2 The XQuery Language

Like OQL, XQuery is a functional language in which a query is represented as an expression. XQuery supports several kinds of expressions, and the structure and appearance of a query may differ significantly depending on which kinds of expressions are used. The various forms of XQuery expressions can be nested with full generality.

The input and output of a query are instances of a data model which is used by both XQuery 1.0 and XPath 2.0 [XQuery 1.0 and XPath 2.0 Data Model]. This data model is a refinement of the data model described in the XPath Version 1 specification [XPath 1.0], in which a document is modeled as a tree of nodes. The data model is capable of modeling not only an XML document but also a well-formed fragment of a document, a sequence of documents, or a sequence of document fragments. An instance of the data model is an ordered sequence of nodes, each of which may contain nested sequences of nodes. There is no distinction between a single node and a node sequence of length one. Although sequences are always ordered, an unordered function is provided which randomly reorders a sequence; this function effectively permits a query optimizer to generate the sequence in whatever order it finds most efficient.

The principal forms of XQuery expressions are as follows:

  1. Path expressions

  2. Element constructors

  3. FLWR expressions

  4. Expressions involving operators and functions

  5. Conditional expressions

  6. Quantified expressions

  7. Expressions that test or modify datatypes

The following sections introduce and explain each of the expression types listed above. Each section contains a specification of the relevant part of the XQuery syntax, and the syntax is repeated in its entirety in Appendix B. In general, when the syntax for an expression calls for a nested expression, any of the XQuery expression types may be used. The grammar in this document is designed to be readable and as a result it is rather permissive. An implementation of XQuery will need to augment the syntax given here with some semantic rules. For example, the syntax permits any expression to be used in an IF-clause, but semantically an IF-expression is valid only if it returns a Boolean result.

The semantics of the various types of XQuery expressions are described informally in this document and more formally in [XQuery 1.0 Formal Semantics]. However, XQuery is a declarative language, and its semantic specification is not intended to prescribe an implementation strategy. It is expected that implementations will use a variety of optimization algorithms to achieve the specified results.

This document introduces all the operators of XQuery, specifies the syntax of a function call, and gives several examples of built-in functions. An enumeration of all the functions supported by XQuery, and a detailed description of how the operators apply to the various primitive datatypes, are provided in a separate document called XQuery Functions and Operators (to appear).

In XQuery, keywords (such as FOR and LET) are case-insensitive, whereas identifiers (such as myBigBook) are case-sensitive. In this document, keywords are capitalized to make them easy to distinguish from identifiers. Keywords in XQuery are not reserved (the grammar has been designed in such a way that keywords may be used as names, and their role is resolved by context.)

A query may contain a comment, which is ignored during query processing. The beginning delimiter of a comment is a pound symbol ("#") and the ending delimiter is a newline character, as illustrated in the following example:

  # This is a comment.

2.1 Path Expressions

One of the forms of an XQuery expression is a path expression, based on the syntax of XPath [XPath 1.0]. XPath is a notation for navigating along "paths" in an XML document, and is used in several XML-related applications including XSLT [XSLT] and XPointer [XPointer].

The XML Query and XSL Working Groups are cooperatively working on both the syntax and the semantics for an extended version of XPath, to be known as XPath Version 2.0, based on the [XPath 2.0 Requirements]. One of these requirements states that "XPath 2.0 should maintain backward compatibility with XPath 1.0." XQuery is expected to be a superset of XPath 2.0.

This document relies on [XPath 1.0] as a specification of XPath 1.0, and focuses mainly on the extensions introduced by XPath 2.0 and XQuery.

PathExpr   ::=   RelativePathExpr
| ("/" RelativePathExpr?)
| ("//" RelativePathExpr?)
RelativePathExpr   ::=   StepExpr ( ("/" | "//") StepExpr)*
StepExpr   ::=   AxisStepExpr | OtherStepExpr
AxisStepExpr   ::=   Axis NodeTest StepQualifiers
OtherStepExpr   ::=   PrimaryExpr StepQualifiers
StepQualifiers   ::=   ( ("[" Expr "]") | ("=>" NameTest) )*
Axis   ::=   (NCName "::") | "@"
PrimaryExpr   ::=   "."
| ".."
| NodeTest
| Variable
| Literal
| FunctionCall
| ParenthesizedExpr
| CastExpr
| ElementConstructor
Literal   ::=   NumericLiteral | StringLiteral
NodeTest   ::=   NameTest | KindTest
NameTest   ::=   QName | Wildcard
KindTest   ::=   PITest | CommentTest | TextTest | AnyKindTest
PITest   ::=   "processing-instruction" "(" StringLiteral? ")"
CommentTest   ::=   "comment" "(" ")"
TextTest   ::=   "text" "(" ")"
AnyKindTest   ::=   "node" "(" ")"

A path expression can begin with an expression that identifies a specific node or sequence of nodes in a document. For example, the function document(string) returns the root node of a named document. A path expression can also begin with "/" or "//" which represents an implicit root node, determined by the environment in which the path expression is executed. The execution environment of a path expression also defines a "context node," which can be referenced by dot (".") inside the path expression.

A path expression consists of a series of "steps". Each step represents movement through a document along a specified "axis", and each step can apply one or more predicates to eliminate nodes that fail to satisfy a given condition. The result of each step is a sequence of nodes that serves as a starting point for the next step. The default axis is the "child" axis, and unless otherwise specified, the first step in a path returns nodes from among the immediate children of the context node.

Ed. Note: The XQuery grammar used in this document includes full axis syntax, but the examples use abbreviated syntax. Priority feedback is requested on whether XQuery should support full axis syntax and on which axes should be supported. See issue xquery-xpath-axes in Appendix C.

The result of a path expression is a sequence of nodes or primitive values. The nodes in a path expression result are ordered according to their position in the original hierarchy, in document order (as defined in [XPath 1.0].) If the result of a path expression includes nodes that are in different documents, the ordering of these nodes is implementation-dependent. The result of a path expression may contain duplicate values (i.e., multiple nodes with the same name, type, and content), but it will not contain duplicate nodes (i.e., multiple occurrences of the same node).

The following example uses a path expression consisting of three steps. The first step locates the root node of a document. The second step locates the second chapter of the document (more formally, it locates <chapter> descendants of the root node that are the second such element within their respective parents.) The third step finds figure elements occurring anywhere within the chapter, but retains only those figure elements that have a caption with the value "Tree Frogs."

(Q1) In the second chapter of the document named "zoo.xml", find the figure(s) with caption "Tree Frogs".
document("zoo.xml")//chapter[2]//figure[caption = "Tree Frogs"]

XPath allows a node to be selected from a sequence of nodes by a specifying its ordinal number in the sequence (as in the step chapter[2] in Q1.) XQuery allows the selection condition to contain a sequence of integers that specify the ordinal numbers of the nodes to be selected. The sequence can be specified by literal numbers (as in 1, 3, 5, 7) or by an expression that generates a sequence of consecutive integers (as in 1 TO 8). The first node in a sequence is considered to have ordinal number 1.

(Q2) Find all the figures in chapters 2 through 5 of the document named "zoo.xml."
document("zoo.xml")//chapter[2 TO 5]//figure

In addition to the usual operators of XPath, XQuery introduces an operator called the dereference operator ("=>"). The operand on the left side of the dereference operator must be a node of type IDREF or IDREFS. The dereference operator returns the element(s) that are referenced by the attribute. A dereference operator is followed by a "name test" that specifies the name of the target element. Target elements not matching the given name are not returned. Following the usual XPath convention, a name test of "*" allows the target element to have any name.

A dereference operator can be used only with documents that have schemas or DTD's, since the operator needs to find nodes of type ID, IDREF, and IDREFS, which can be done only by reference to a schema or DTD.

Dereference operators can be used in the steps of a path expression. For example, the following query uses a dereference operator to find the caption of the "fig" element referenced by the "refid" attribute of a "figref" element.

(Q3) Find captions of figures that are referenced by <figref> elements in the chapter of "zoo.xml" with title "Frogs".
document("zoo.xml")//chapter[title = "Frogs"]

The XQuery dereference operator is similar in purpose to the id function of XPath. However, the right-arrow notation is designed to be easier to read, especially in path expressions that involve multiple dereferences. For example, suppose that a given document contains a set of <emp> elements, each of which contains a "mgr" attribute. The "mgr" attribute is of type IDREF, and it references another <emp> element that represents the manager of the given employee. The name of each employee is represented by a <name> element nested inside the <emp> element.

(Q4) List the names of the second-level managers of all employees whose rating is "Poor".
//emp[rating = "Poor"]/@mgr=>emp/@mgr=>emp/name

In XPath, each step selects a set of nodes relative to a context node. Each node in that set is then used in turn as a context node for the following step. The sets of nodes selected in this way are combined, as though by the UNION operator (defined in Section 2.5.4), to obtain the result of the following step. XQuery extends XPath 1.0 by allowing each step in a path expression to contain any XQuery expression, enclosed in parentheses as needed to avoid ambiguity. As usual, the nodes selected by one step are used as context nodes for the expression in the following step. The following example shows how an expression containing a UNION (denoted in this query by the "|" operator) can be used in one step of a path expression.

(Q5) Find all captions of figures and tables in the chapter of "zoo.xml" with title "Monkeys".
document("zoo.xml")//chapter[title = "Monkeys"]
   //(figure | table)/caption

Ed. Note: This query uses a union within a step. In our current grammar, this requires the use of full expression syntax within the steps of a path, which been recommended by the joint Query/XSLT Task Force but has not yet been approved by the respective Working Groups. See issue xquery-full-expression-syntax in Appendix C.

Path expressions often contain operators, such as arithmetic operators, that are defined over simple datatypes. When such an operator is used with an operand that is a node, the built-in data function is implicitly invoked to extract the content of the node as a typed value. If data is invoked with no argument, its implicit argument is the current (context) node. For example, if the content of the current node is the integer 47, then data() has type integer and value 47. If the content of its argument node cannot be expressed as a value of a simple type, the data function raises an error.

When an operator defined over simple datatypes is used with an operand that is a sequence of nodes, the data function is invoked repeatedly to extract the contents of the nodes, resulting in a sequence of simple values. The semantics of various operators, when applied to sequences of simple values, are described in Section 2.5, and in more detail in XQuery Functions and Operators (to appear).

The following example illustrates implicit and explicit invocations of the data function. If the query selects a single employee who has a single salary, the result of the query will be an integer. If the query selects multiple employees, or a single employee with multiple salaries, the result of the query will be a sequence of integers.

(Q6) From a document that contains employees and their monthly salaries, extract the annual salary of the employee named "Fred".
//emp[name="Fred"]/salary * 12
(Q6, alternate form) (Equivalent to Q6, with explicit invocation of data.)
//emp[name="Fred"]/salary/data() * 12

Note the difference between the result-type of the expression //emp/salary, which returns zero or more elements, and the expression //emp/salary*12, which returns zero or more integers.

2.2 Element Constructors

In addition to searching for elements in existing documents, a query often needs to generate new elements. The simplest way to generate a new element is to embed the element directly in a query using XML notation. In other words, one of the permitted forms of an XQuery expression is an XML element that represents itself. This type of an XQuery expression is called an element constructor. By adopting XML notation for element constructors, XQuery allows literal XML fragments to be "pasted" into queries (with some limitations--for example, the handling of entity references is still under discussion.)

In most parts of the XQuery language, white space is not significant. However, according to the definition of XML, white space is significant under certain circumstances inside an XML tag. Therefore, in the following fragment of the XQuery grammar, white space is significant and is represented by the nonterminal symbol "S".

ElementConstructor   ::=   "<" NameSpec AttributeList ("/>" |
           (">" ElementContent* "</" (QName S?)? ">") )
NameSpec   ::=   QName | ( "{" Expr "}" )
AttributeList   ::=   (S (NameSpec S? "=" S? (AttributeValue
           | EnclosedExpr) AttributeList)? )?
AttributeValue   ::=   ( ["] AttributeValueContent* ["] )
| ( ['] AttributeValueContent* ['] )
ElementContent   ::=   Char
| ElementConstructor
| EnclosedExpr
| CdataSection
| CharRef
| PredefinedEntityRef
AttributeValueContent   ::=   Char
| CharRef
| EnclosedExpr
| PredefinedEntityRef
CdataSection   ::=   "<![CDATA[" Char* "]]>"
EnclosedExpr   ::=   "{" ExprSequence "}"

The simplest example of an element constructor in XQuery uses pure XML notation, as in Q7:

(Q7) Generate an <emp> element that has an "empid" attribute and nested <name> and <job> elements.
<emp empid = "12345">
   <name>John Smith</name> 

Often the content of an element or the value of an attribute needs to be computed by some expression. An XQuery expression that is used inside an element constructor is enclosed in curly braces to indicate that the expression is to be evaluated rather than treated as text. In the following example, attribute values and element contents are specified in the form of variables named $id, $name, and $job. From the context of the query, we might expect $id to be bound to a string, and $name and $job to be bound to nodes or node sequences (though they could also be bound to strings). When a variable used in an element constructor is bound to a node, the newly constructed element receives a copy of that node and all its descendants.

(Q8) Generate an <emp> element that has an "empid" attribute. The value of the attribute and the content of the element are specified by variables that are bound in other parts of the query.
<emp empid = {$id}>

In the above example, curly braces ("{" and "}") are used in places where they would ordinarily be considered to be part of a character string. Since XQuery uses curly braces as delimiters to identify an expression to be evaluated, some other means is needed to denote a curly brace used as an ordinary character. For this purpose, XQuery adopts the same convention as XSLT: Two adjacent curly braces in an XQuery character string are interpreted as a single curly brace character.

Ed. Note: The above example could not be parsed by an off-the-shelf XML parser because of the curly braces around the attribute value. An alternative would be to require the attribute value to be enclosed in quotes, and use curly braces inside the quoted value to denote an expression to be evaluated. See issue xquery-quote-computed-attribute-value in Appendix C.

Occasionally it is necessary for the name of an element or attribute to be computed by an expression. For this purpose, XQuery allows the name of an element or attribute to be an XQuery expression enclosed in curly braces. When the name in a start-tag is an expression, the name must be omitted from the corresponding end-tag (however, when a start-tag contains a constant name, the same name must be specified in the matching end-tag.)

The following example uses the XPath functions name(element), which returns the tagname of an element, and number(element), which returns the content of an element expressed as a number. When an expression inside the body of an element constructor evaluates to one or more attributes, those attributes are considered to be attributes of the element that is being constructed.

(Q9) Variable $e is bound to some element with numeric content. Construct a new element having the same name and attributes as $e, and with numeric content equal to twice the content of $e.
<{name($e)}>   # replicates the name of $e
   {$e/@*}            # replicates the attributes of $e
   {2 * number($e)}   # doubles the content of $e

Ed. Note: name($e) may not be the right function to use here, since name() returns a string and we need a QName.

Like elements, XML comments and processing instructions can be generated simply by embedding them in a query using the usual XML notation, as in the examples below. It is important to note that XML comment notation generates a comment in the query result, whereas XQuery comment notation (delimited by "#" and end-of-line) serves as a comment in the query itself but has no effect on the query result.

(Q10) Generate an XML comment and a processing instruction.
<!-- Houston, we have a problem. -->

<?MyFormatter fontsize=47 ?> 

Ed. Note: Comments and processing instructions have not yet been added to the XQuery grammar. See issue xquery-comment-pi-productions in Appendix C.

2.3 FLWR Expressions

FLWRExpr   ::=   (ForClause | LetClause)+ WhereClause? "return" Expr
ForClause   ::=   "for" Variable "in" Expr ("," Variable "in" Expr)*
LetClause   ::=   "let" Variable ":=" Expr ("," Variable ":=" Expr)*
WhereClause   ::=   "where" Expr

A FLWR (pronounced "flower") expression is constructed from FOR, LET, WHERE, and RETURN clauses, which must appear in a specific order. A FLWR expression binds values to one or more variables and then uses these variables to construct a result. The overall flow of data in a FLWR expression is illustrated in Figure 1.

The first part of a FLWR expression consists of FOR-clauses and/or LET-clauses, which serve to bind values to one or more variables. The values to be bound to the variables are represented by expressions (for example, path expressions).

A FOR-clause is used whenever iteration is needed. The FOR-clause introduces one or more variables, associating each variable with an expression. For example, a FOR-clause might contain a path expression that returns a sequence of nodes. The result of the FOR-clause is a sequence of tuples, each of which contains a binding for each of the variables in the FOR-clause. The variables are bound to individual values returned by their respective expressions. Each variable in a FOR-clause can be thought of as iterating over the values returned by its respective expression, in order.

A LET-clause is also used to bind one or more variables to one or more expressions. Unlike a FOR-clause, however, a LET-clause simply binds each variable to the value of its respective expression without iteration, resulting in a single binding for each variable. The difference between a FOR-clause and a LET-clause can be illustrated by a simple example. The clause FOR $x IN /library/book results in many bindings, each of which binds the variable $x to one book in the library. On the other hand, the clause LET $x := /library/book results in a single binding which binds the variable $x to a sequence containing all the books in the library.

A FLWR expression may contain several FOR and LET-clauses. Expressions used in FOR and LET-clauses may contain references to variables bound earlier in the FLWR expression. The result of the FOR and LET clauses is an ordered sequence of tuples of bound variables. If all the FOR-clause expressions are independent, the number of tuples generated is the product of the cardinalities of all the FOR-clause expressions. A FLWR expression that contains no FOR-clauses generates exactly one binding-tuple. The order of the tuples generated by the FOR and LET clauses is determined by the order in which values are returned by the FOR-clause expressions. The order in which the variables are bound determines the order of nested iteration of the FLWR expression.

Flow of data in a FLWR expression
Figure 1: Flow of data in a FLWR expression

The binding-tuples generated by the FOR and LET clauses are subject to further filtering by an optional WHERE-clause. Only those tuples for which the condition in the WHERE-clause is true are used to invoke the RETURN clause. The WHERE-clause may contain several predicates, connected by AND and OR. These predicates usually contain references to the bound variables. Variables bound by a FOR-clause are usually bound to individual nodes and so they are typically used in scalar predicates such as $p/color = "Red". Variables bound by a LET-clause, on the other hand, often represent sequences of nodes, and can be used in set-oriented predicates such as avg($p/price) > 100. The ordering of the binding-tuples generated by the FOR and LET clauses is preserved by the WHERE-clause.

The RETURN-clause generates the output of the FLWR expression, which may be any sequence of nodes or primitive values. The RETURN-clause is executed once for each tuple of bindings that is generated by the FOR and LET-clauses and satisfies the condition in the WHERE-clause, preserving the order of these tuples. The RETURN-clause contains an expression that often contains element constructors, references to bound variables, and nested subexpressions. The results generated by the individual executions of the RETURN clause are concatenated together, preserving their order (therefore, unlike a path expression, the result of a FLWR expression may contain duplicate nodes based on node-identity.)

We will consider some examples of FLWR expressions based on a document named "bib.xml" that contains a sequence of <book> elements. Each <book> element, in turn, contains a <title> element, one or more <author> elements, a <publisher> element, a <year> element, and a <price> element. The first example is so simple that it could have been expressed using a path expression, but it is perhaps more readable when expressed as a FLWR expression.

(Q11) List the titles of books published by Morgan Kaufmann in 1998.
FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title

The above example returns titles of books in the order in which they appeared in the original document ("bib.xml"). If the ordering of the query result is not important, the unordered function can be used to indicate that books can be processed in any order, possibly resulting in a more efficient execution, as in the following example:

(Q11, alternate form) (Equivalent to Q11, relaxing constraint on order of result)
FOR $b IN unordered(document("bib.xml")//book)
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title

The next example uses a distinct function in the FOR-clause to eliminate duplicate values from the set of publishers found in the input document. Two elements are considered to have duplicate values if their names, attributes, and normalized content are equal. From each set of nodes with duplicate values, the distinct function retains one node (the node that is retained is implementation-defined.) The distinct function also includes the semantics of the unordered function (that is, the ordering of the sequence returned by a distinct function is not significant or deterministic.) The example uses a LET-clause to bind a variable to the average price of books published by each of the publishers bound in the FOR-clause.

(Q12) List each publisher and the average price of its books.
FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")//book[publisher = $p]/price)
      <name> {$p/text()} </name> 
      <avgprice> {$a} </avgprice>

The next example uses a LET-clause to bind a variable $b to a set of books, and then uses a WHERE-clause to apply a condition to the set, retaining only bindings in which $b contains more than 100 elements. This query also illustrates the common practice of enclosing a FLWR expression inside an element constructor, which provides an enclosing element for the query result.

(Q13) List the publishers who have published more than 100 books.
    FOR $p IN distinct(document("bib.xml")//publisher)
    LET $b := document("bib.xml")//book[publisher = $p]
    WHERE count($b) > 100
    RETURN $p

FLWR expressions are often useful for performing structural transformations on documents, as illustrated by the next query, which inverts a hierarchy. This example also illustrates how one FLWR expression can be nested inside another.

(Q14) Invert the structure of the input document so that, instead of each book element containing a sequence of authors, each distinct author element contains a sequence of book-titles.
    FOR $a IN distinct(document("bib.xml")//author)
         <name> {$a/text()} </name>
          FOR $b IN document("bib.xml")//book[author = $a]
          RETURN $b/title

LET-clauses are useful for breaking up long expressions, making queries more readable. They can also be helpful in simplifying a query that makes multiple uses of the same subexpression. In the following example, the average price of books is a common subexpression that is bound to variable $a and then used repeatedly in the body of the query.

(Q15) For each book whose price is greater than the average price, return the title of the book and the amount by which the book's price exceeds the average price.
    LET $a := avg(document("bib.xml")//book/price)
    FOR $b IN document("bib.xml")//book
    WHERE $b/price > $a
             {$b/price - $a}

Computed element names and attribute names can sometimes be used with FLWR expressions to perform a transformation on the structure of a document. In the following example, variable $e is bound to an element that has some attributes and some subelements. The content of each subelement is simple text. The example uses the attribute function, which constructs an attribute from two strings that contain the name and value of the attribute.

(Q16) Construct a new element having the same name as the element bound to $e. Transform all the attributes of $e into subelements, and all the subelements of $e into attributes.
    FOR $c IN $e/*
    RETURN attribute(name($c), string($c))
    FOR $a IN $e/@*

2.4 Sorting

SortExpr   ::=   Expr "sortby" "(" SortSpecList ")"
SortSpecList   ::=   Expr ("ascending" | "descending")? ("," SortSpecList)?

It is sometimes necessary to control the order of elements in a sequence. The sequence to be ordered might be a represented by a whole query or by a subexpression within a query. A sequence can be ordered by means of a SORTBY clause that contains one or more "ordering expressions." Each ordering expression is evaluated for each member of the sequence (that is, the ordering expression is evaluated with each member of the sequence as context node). For each member of the sequence, the ordering expression must return a single value of some type for which the ">" operator is defined (for example, a number or a string); otherwise an error results. The members of the sequence are ordered according to the values of their respective ordering expressions. If more than one ordering expression is specified, the leftmost ordering expression controls the primary sort, followed by the remaining ordering expressions from left to right. Each ordering expression can be followed by the word ASCENDING or DESCENDING, which specifies the direction of the sort (ASCENDING is the default). The following query uses a SORTBY clause with two ordering expressions:

(Q17)List all books with price greater than $100, in order by first author; within each group of books with the same first author, list the books in order by title.
document("bib.xml")//book[price > 100] SORTBY (author[1], title)

If a query result contains several levels of nested elements, an ordering may be specified among the elements at each level. This is often accomplished by nested FLWR expressions and sorting expressions. It is important to remember that SORTBY is not a part of a FLWR expression, but is a separate form of expression that can be used in many different contexts.

(Q18) Make an alphabetic list of publishers. Within each publisher, make a list of books, each containing a title and a price, in descending order by price.
   {FOR $p IN distinct(document("bib.xml")//publisher)
          <name> {$p/text()} </name> 
          {FOR $b IN document("bib.xml")//book[publisher = $p]
           SORTBY(price DESCENDING)

2.5 Operators in Expressions

Like most query languages, XQuery provides a variety of operators that can be used in expressions, and allows parenthesized expressions to serve as operands. The following sections summarize the categories of operators that are provided by XQuery. For details of operator precedence, see Appendix B. For details of the semantics of operators and the datatypes to which they apply, see XQuery Functions and Operators (to appear).

2.5.1 Arithmetic operators

AdditiveExpr   ::=   Expr ("+" | "-") Expr
MultiplicativeExpr   ::=   Expr ("*" | "div" | "mod") Expr
UnaryExpr   ::=   ("-" | "+") Expr

XQuery provides the usual arithmetic operators for addition, subtraction, multiplication, division, and modulus, in the usual binary and unary forms and with the usual meanings.

When both operands of an arithmetic operator are numeric, the result is straightforward. When one or more operands is a node, the content of the node is extracted by an implicit call to the data function and converted to a number before the operation is performed; if this conversion is not possible, an error results. When one operand is a number and the other is a sequence of numbers, the result of the operator is a sequence that is created by applying the operator pairwise to the numeric operand and each individual member of the sequence operand, preserving the original order.

Ed. Note: The semantics of arithmetic between two sequences is an open issue. See issue xquery-arithmetic-among-sequences in Appendix C.

2.5.2 Comparison operators

EqualityExpr   ::=   Expr ("=" | "!=" | "==" | "!==") Expr
RelationalExpr   ::=   Expr ("<" | "<=" | ">" | ">=") Expr

XQuery supports several comparison operators, each of which takes two operands and returns a Boolean result. If one operand is a single value and the other is a sequence, the result of the comparison is true if there exists some member of the sequence for which the comparison with the single operand is true. If both operands are sequences, the comparison is true if there exists some member of the first sequence and some member of the second sequence for which the comparison is true.

The =, !=, <, <=, >, and >= operators perform value comparisons. If both operands are simple values of the same type, the result is straightforward. If the operands are simple values of compatible types, the operand of the less-inclusive type is converted to the more-inclusive type for the purpose of comparison (for example, an integer might be converted to a float in order to be compared with a float.) If one operand is a node and the other is a simple value, the content of the node is extracted by an implicit invocation of the data function before the comparison is performed. If both operands are nodes, the string-values of the nodes are compared, as defined in [XPath 1.0].

The == and !== operators perform "node identity" comparisons that are defined only for nodes or sequences of nodes. If both operands of == are nodes, the comparison is true only if both operands are the same node (not just nodes with the same name and value). If either or both operands is a node sequence, the rules stated above apply. The !== comparison is true whenever the == comparison is not true.

Ed. Note: There is currently no way to specify a collation sequence to specify the semantics of the inequality operators. See issue xquery-collation-sequences in Appendix C.

2.5.3 Logical operators

OrExpr   ::=   Expr "or" Expr
AndExpr   ::=   Expr "and" Expr

The AND and OR operators of XQuery take two Boolean operands and return a Boolean result, using the usual semantics for these operators. Unlike many query languages, XQuery does not support a logical NOT operator. However, it does provide a function not() that takes a Boolean value as its argument and returns the logical negation of the argument.

Ed. Note: XQuery has not yet defined how it handles missing or absent values. See issue xquery-three-value-logic in Appendix C.

2.5.4 Sequence-related Operators

ParenthesizedExpr   ::=   "(" ExprSequence? ")"
ExprSequence   ::=   Expr ("," Expr)*
RangeExpr   ::=   Expr "to" Expr
UnionExpr   ::=   Expr ("union" | "|") Expr
IntersectExceptExpr   ::=   Expr ("intersect" | "except") Expr
BeforeAfterExpr   ::=   Expr ("before" | "after") Expr

As described earlier, the XPath 2.0 Data Model [XQuery 1.0 and XPath 2.0 Data Model] supports ordered sequences of values, and does not distinguish between a single value and a sequence of one value. Values in sequences may be either nodes or primitive values such as numbers or strings. Sequences are always one level deep--that is, a sequence never appears as a member of another sequence. All operators that generate sequences automatically "flatten" their operands. For example, if two sequences are combined to form a new sequence, the new sequence is a one-level sequence derived from the individual members of the original sequences.

The basic XQuery operator for forming sequences is the comma operator (","). The comma operator can be applied to any two expressions to combine them into a sequence. The length of the resulting sequence is the sum of the lengths of the original sequences (scalar values are considered to be sequences of length one.) The new sequence consists of all the members of the left-hand sequence, followed in order by all the members of the right-hand sequence, with duplicates preserved. A sequence of zero values is represented by empty parentheses.

Since the comma operator is also used to separate the arguments of a function call, parentheses may be needed when a sequence is used as the argument of a function, as illustrated in the following examples:

f(1, 2, 3) Denotes a function call with three scalar arguments.
f((1, 2), 3) Denotes a function call with two arguments, the first of which is a sequence of two values.
f((1, 2, 3)) Denotes a function call with one argument that is a sequence of three values.
f(1, ( )) Denotes a function call with two arguments, the second of which is an empty sequence.

Another way to generate a sequence is by means of the TO operator. TO is a binary operator that converts both of its operands to integers. It then generates a sequence containing all the integers from the left-hand operand to the right-hand operand, inclusive. If either of the operands cannot be converted to an integer, an error results. If the left-hand operand is larger than the right-hand operand, the sequence of integers is generated in descending order. For example, the expression 12 TO 8 is equivalent to the expression 12, 11, 10, 9, 8.

The operators UNION, INTERSECT, and EXCEPT can be used to combine node sequences to form new node sequences. UNION (equivalent to "|") returns a sequence containing those nodes that are members of either the left-hand or the right-hand operand. (The "|" form of this operator is retained for compatibility with XPath 1.0, and the same operator can be invoked by the UNION keyword for consistency with INTERSECT and EXCEPT.) INTERSECT returns a sequence containing those nodes that are members of both the left-hand and right-hand operands. EXCEPT returns a sequence containing those nodes that are members of the left-hand but not the right-hand operand. The result of UNION, INTERSECT, or EXCEPT contains no duplicate nodes, based on node identity--in other words, the same node will not appear more than once in the result, but different nodes may appear that have the same name and value. The result of UNION, INTERSECT, or EXCEPT is a sequence in which all the nodes appear in document order if they are all in the same document; if the resulting nodes are not in the same document, their order is implementation-defined.

Ed. Note: The definitions of UNION, INTERSECT, and EXCEPT for simple values are still under discussion. See xquery-set-operators-on-values in Appendix C.

From XQL [XQL], XQuery inherits the infix operators BEFORE and AFTER, which are useful in searching based on position in a sequence. BEFORE operates on two sequences of nodes and returns those nodes in the first sequence that occur before at least one node in the second sequence in document order (of course, this is possible only if the two sequences are subsets of the same document.) AFTER is defined in a similar way. Since BEFORE and AFTER are based on global document ordering, they can compare the positions of nodes that do not have the same immediate parent. The next two examples illustrate the use of BEFORE and AFTER by retrieving excerpts from a surgical report that includes <procedure>, <incision>, and <anesthesia> elements.

(Q19) Prepare a "critical sequence" report consisting of all elements that occur between the first and second incision in the first procedure.
   LET $p := //procedure[1]
   FOR $e IN //* AFTER ($p//incision)[1] 
          BEFORE ($p//incision)[2]
   RETURN shallow($e)

The shallow function makes a shallow copy of a node, including attributes but not including subelements.

(Q20) Find procedures in which no anesthesia occurs before the first incision.
# Finds potential lawsuits
FOR $p in //procedure
WHERE not(empty($p//incision))
AND empty($p//anesthesia BEFORE ($p//incision)[1])

The empty function returns True if and only if its argument is an empty sequence.

2.6 Conditional Expressions

IfExpr   ::=   "if" "(" Expr ")" "then" Expr "else" Expr

A conditional expression evaluates a test expression and then returns one of two result expressions. If the value of the test expression is True, the value of the first result expression is returned; otherwise, the value of the second result expression is returned.

Ed. Note: The grammar for XQuery allows any expression as a conditional. We have not yet determined which types of test expressions can be implicitly converted to Boolean, and which raise errors. See xquery-anything-to-boolean in Appendix C.

As an example of a conditional expression, consider a library that has many holdings, each described by a <holding> element with a "type" attribute that identifies its type: book, journal, etc. All holdings have a title and other nested elements that depend on the type of holding.

(Q21) Make a list of holdings, ordered by title. For journals, include the editor, and for all other holdings, include the author.
FOR $h IN //holding
       IF ($h/@type = "Journal")
       THEN $h/editor
       ELSE $h/author
SORTBY (title)

Note the syntactic structure of the above query. The query consists of a SORTBY applied to a FLWR-expression that contains an element constructor. The element constructor generates <holding> elements. The content of these elements is specified by a sequence that consists of a path expression and a conditional expression, separated by a comma. The following is an equivalent formulation of the same query in which the element constructor contains two separate XQuery expressions rather than a single sequence expression. Note that no comma is used in the alternative form.

(Q21, alternate form)
FOR $h IN //holding
      {IF ($h/@type = "Journal")
       THEN $h/editor
       ELSE $h/author
SORTBY (title)

2.7 Quantified Expressions

Occasionally it is necessary to test for existence of some element that satisfies a condition, or to determine whether all elements in some collection satisfy a condition. For this purpose, XQuery provides two forms of expression called the "some" expression and the "every" expression. These forms of expression are also known as quantified expressions. The "some" expression uses an existential quantifier, and the "every" expression uses a universal quantifier.

SomeExpr   ::=   "some" Variable "in" Expr "satisfies" Expr
EveryExpr   ::=   "every" Variable "in" Expr "satisfies" Expr

The "some" expression is illustrated in the next example. The value of a "some" expression is always True or False. Like the FOR-clause of a FLWR expression, a "some" expression generates multiple bindings for a variable, using values returned by the expression in the IN-clause. For each of these bindings, the expression in the SATISFIES expression is executed. If at least one execution of the SATISFIES expression returns the Boolean value True, then the value of the "some" expression is True; otherwise the value of the "some" expression is False. Of course, if the expression in the IN-clause does not return any nodes, the SATISFIES expression is not evaluated and the result of the "some" expression is False.

(Q22) Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph.
FOR $b IN //book
   (contains($p, "sailing") AND contains($p, "windsurfing"))
RETURN $b/title

The "every" expression is illustrated in the next example. Like the "some" expression, the "every" expression always returns True or False, and it executes the SATISFIES-clause once for each node returned by the IN-clause. If at least one execution of the SATISFIES expression returns the Boolean value False, then the value of the "every" expression is False; otherwise the value of the "every" expression is True. Of course, if the expression in the IN-clause does not return any nodes, the SATISFIES expression is not evaluated and the result of the "every" expression is True.

(Q23) Find titles of books in which sailing is mentioned in every paragraph.
FOR $b IN //book
   contains($p, "sailing")
RETURN $b/title

Q23 also returns books that contain no paragraphs.

It is sometimes useful to nest quantified expressions, as in the following example:

(Q24) Assume that employees can have multiple skills and multiple duties. Find names of employees who have some duty that is not matched by a skill.
FOR $e IN //emp
   not(SOME $s IN $e/skill SATISFIES $s = $d)
RETURN $e/name

Ed. Note: Open issue: should quantifiers be able to bind more than one variable? See issue xquery-quantifier-multiple-variables in Appendix C.

2.8 Datatypes

XQuery has a type system that is based on XML Schema. By using the datatype names defined in the namespace http://www.w3.org/2001/XMLSchema (hereafter abbreviated as xsd), all the primitive and derived datatypes of XML Schema can be used in queries. Other complex types declared using XML Schema can also be referred to by their qualified names.

In XQuery, type names appear in function declarations where they specify the types of the function parameters and result. Type names are also used in CAST and TREAT expressions and as operands of the INSTANCEOF operator, as described in Section 2.11.

Certain XML Schema datatypes have literal forms that are recognized by XQuery, as illustrated by the following examples:

Type Example of literal
xsd:string "Hello"
xsd:integer 47, -369
xsd:decimal -2.57
xsd:float -3.805E-2

Literal values of XML Schema types other than string, integer, decimal, and float can be specified by means of constructor functions such as true(), false(), and date("2000-06-25"), or by cast expressions such as CAST AS xsd:positiveInteger(47).

Ed. Note: The set of constructor functions has not yet been fixed. A complete specification of XQuery literal formats and constructor functions will be provided in XQuery Functions and Operators (to appear.)

2.9 Functions

FunctionDefn   ::=   "define" "function" QName "(" ParamList? ")"
           ("returns" Datatype)? EnclosedExpr
ParamList   ::=   Param ("," Param)*
Param   ::=   Datatype? Variable
FunctionCall   ::=   QName "(" (Expr ("," Expr)*)? ")"

XQuery provides a core library of built-in functions. We have already introduced some of these core functions, such as document, which returns the root node of a named document. The XQuery core function library contains all the functions of the XPath core function library, as well as aggregation functions such as avg, sum, count, max, and min, and many other useful functions. For example, the distinct function eliminates duplicate nodes from a sequence, and the empty function returns True if and only if its argument is an empty sequence. A complete specification of the XQuery core function library is provided in XQuery Functions and Operators (to appear.)

In addition to the core functions, XQuery allows users to define functions of their own. A function definition specifies the name of the function, the names and datatypes of the parameters, and the datatype of the result. Datatypes are specified by their qualified names. A function definition also provides an expression (called the "function body") that defines how the result of the function is computed from its parameters. When a function is invoked, its arguments must be valid instances of the declared parameter types. The result of a function must also be a valid instance of the declared result type.

If a function parameter is declared using a name but no type, it is considered to have the default type "any node." If the RETURNS clause is omitted from a function definition, the result-type of the function is considered to be "any sequence of nodes."

Ed. Note: The syntax for declaring types in a function definition is still under discussion. A notation is needed to distinguish element-names from type-names. A notation is also needed to distinguish a single node from a sequence of nodes. A "wildcard" notation is also needed to denote "any element" or "any node." Various alternatives have been proposed for these notations. In this document, function definitions that need "wildcard" types will use the defaults described above (no explicit type declaration). See issue xquery-function-definition in Appendix C.

XQuery Version 1 does not allow user-defined functions to be overloaded--that is, it does not allow multiple functions to be declared with the same name. We consider function overloading to be a useful and important feature that deserves further study in future versions of XQuery. Although XQuery does not allow overloading of user-defined functions, some of the built-in functions in the XQuery core library are overloaded--for example, the string function of XPath can convert an instance of almost any type into a string, and it can be invoked with either one argument or zero arguments.

Ed. Note: The XQuery rules for function calls need to allow for functions with optional arguments that default to the current node, such as data(). Many XPath functions behave like this. Should this apply to all functions? All unary functions? Only certain functions? See issue xquery-implicit-current-node in Appendix C.

It is possible in XQuery to invoke a function with arguments whose types do not exactly match the declared parameter types of the function, under the following circumstances:

  1. A fixed "promotion hierarchy" is defined among the primitive and derived types of XML Schema. A function whose declared parameter type is in this hierarchy can be invoked with an argument whose type is lower in the hierarchy. For example, a function with a declared parameter-type of Float can be invoked with an integer argument. In such a case, the argument is converted to the declared type of the parameter.

    Ed. Note: The promotion hierarchy for function parameters needs to be defined. See issue xquery-schema-type-promotion in Appendix C.

  2. A function whose declared parameter is a Schema complex type can be called with an argument of another complex type that is derived from the parameter type by extension or by restriction. This is called the principle of subtype substitutability.

    Ed. Note: This rule is still under discussion. It is possible that subtype substitutability may not be supported in the first version of XQuery. See issue xquery-subtype-substitutability in Appendix C.

  3. A function whose declared parameter is a Schema element can be called with an argument that is any element in a substitution group whose head is the declared parameter element.

    Ed. Note: This rule is still under discussion. It is possible that substitution groups may not be supported in the first version of XQuery. See issue xquery-substitution-groups in Appendix C.

  4. A function whose declared parameter is a sequence of a given type can be invoked with an argument that is a single instance of the given type (this is obvious, since there is no distinction between a single object and an object sequence of length one.)

  5. A function whose declared parameter is a given type can be invoked with an argument that is a sequence of the given type. In this case, the function is invoked on each of the members of the argument sequence, in order, and the results are concatenated into a result sequence. For example, suppose that the function price(Book) is declared to take a Book and return an integer. If this function is invoked on a path expression, as in price(//Book[author="Mark Twain"]), the result is a sequence of integers representing the prices of the Books returned by the path expression, in order. This rule is intended to be consistent with XPath, in which each step in a path expression iterates over the nodes returned by the previous step without requiring an explicit iteration operator.

    This rule generalizes to functions with multiple parameters in the following way: suppose that N arguments of a function-call are sequences that match function-parameters where single elements are expected. Then the result of the function-call is a sequence whose members are formed by invoking the function on tuples of arguments taken from the Cartesian product of the N input sequences. The Cartesian product is formed by nested iteration over the input sequences, working from left to right.

    Ed. Note: This rule is still under discussion. If implicit iteration over argument-sequences is not supported in XQuery, the same result can be obtained by explicit iteration using either a FLWR-expression or a path-expression. See issue xquery-sequence-for-single in Appendix C.

A function may be defined recursively--that is, it may reference its own definition. Mutually recursive functions, whose bodies reference each other, are also allowed. The next query contains an example of a recursive function that computes the depth of an element hierarchy. In its definition, the user-defined function depth calls the built-in functions empty and max.

(Q25) Find the maximum depth of the document named "partlist.xml."
NAMESPACE xsd = "http://www.w3.org/2001/XMLSchema"

DEFINE FUNCTION depth($e) RETURNS xsd:integer
   # An empty element has depth 1
   # Otherwise, add 1 to max depth of children
   IF (empty($e/*)) THEN 1
   ELSE max(depth($e/*)) + 1


In the above example, since no explicit type is declared for the function parameter, the function accepts any node as its argument. The expression $e/* returns all the children of the argument node, and the expression depth($e/*) implicitly iterates over these children, returning their respective depths as a sequence of integers. The expression max(depth($e/*)) finds the maximum depth of any child of the argument node.

Ed. Note: If implicit iteration over argument sequences is not supported, the expression max(depth($e/*)) could be expressed as max(for $c in $e/* return depth($c)) or as max($e/*/depth(.)).

To further illustrate the power of functions, we will write a function that returns all the elements that are "connected" to a given element by child or reference connections, and a recursive function that returns all the elements that are "reachable" from a given element by child or reference connections.

(Q26) In the document "company.xml", find all the elements that are reachable from the employee with serial number 12345 by child or reference connections.
DEFINE FUNCTION connected($e)
    $e/* UNION $e/@*=>* 

DEFINE FUNCTION reachable($e)
    $e UNION reachable(connected($e)) 


The connected and reachable functions each take (by default) a node as parameter and return a sequence of nodes as result. The reachable function invokes itself recursively to find all the elements that are reachable from the elements that are directly connected to its parameter node.

Ed. Note: If implicit iteration over argument sequences is not supported, the expression reachable(connected($e)) could be expressed as connected($e)/reachable(.).

The filter function (described more fully in Section 3.1) can be used to select a set of nodes from a hierarchy while preserving the original relationships among these nodes. In the following example, filteris used together with the reachable function defined in the previous example to return all the elements that are reachable from a specific employee element, while preserving their hierarchic and sequential relationships.

(Q27) Return a fragment of the document "company.xml" consisting of all nodes reachable from employee no. 12345, preserving the original relationships among the reachable nodes.

Of course, it is possible to write a recursive function that fails to terminate for some set of arguments. In fact, the reachable function in the previous example will fail to terminate if called on an element that references one of its ancestors. It is the user's responsibility to avoid writing a nonterminating function call.

2.10 User-Defined Datatypes

In addition to the primitive and derived datatypes of XML Schema, any datatype that can be constructed using the definition facilities of XML Schema can be used as an XQuery datatype. XML Schema language can be used to define an element or datatype and to give it a qualified name, which can then be used in an XQuery function declaration. For example, a schema might define an element named PurchaseOrder by specifying a set of attributes and a content model based on sequences and alternations of various other elements. PurchaseOrder could then be used as the type of a function parameter in a query.

A query can refer to element-names and type-names that are defined in any of the following schemas:

  1. Schemas that are referenced by documents used in the query; i.e., the implicit input document(s) or any document referenced by the document function.

  2. Schemas that are explicitly referenced by NAMESPACE and SCHEMA declarations, as described in Section 2.12.

In the following examples, a schema defines complex types named emp_sequence and dept_sequence in the target namespace http://www.bigcompany.example.com/BigNames. A query then uses these datatypes to define and invoke a function.

(Q28) Using XML Schema language, define complex types named emp_sequence and dept_sequence in a target namespace.
<?xml version="1.0">
<schema xmlns="http://www.w3.org/2001/XMLSchema"

   <complexType name="emp_sequence">
         <element name="emp" minOccurs="0" maxOccurs="unbounded">
                  <element name="name" type="string"/>
                  <element name="deptno" type="string"/>
                  <element name="salary" type="decimal"/>
                  <element name="location" type="string"/>

   <complexType name="dept_sequence">
         <element name="dept" minOccurs="0" maxOccurs="unbounded">
                  <element name="deptno" type="string"/>
                  <element name="headcount" type="integer"/>
                  <element name="payroll" type="decimal"/>

(Q29) Using the datatypes defined in Q28, create a function that summarizes employees by department, and use this function to summarize all the employees of Acme Corp. that are located in Denver.
DEFAULT NAMESPACE = "http://www.bigcompany.example.com/BigNames"
SCHEMA "http://www.bigcompany.example.com/BigNames" 

DEFINE FUNCTION summary(emp_sequence $emps) RETURNS dept_sequence
   FOR $d IN distinct($emps/deptno)
   LET $e := $emps[deptno = $d]
         <headcount> {count($e)} </headcount>
         <payroll> {sum($e/salary)} </payroll>

summary(document("acme_corp.xml")//emp[location = "Denver"] )

It is sometimes desirable to validate that the result of a query conforms to a specific datatype. This can be done by taking advantage of the fact that XQuery functions validate the datatypes of their parameters and results. For example, suppose that some query Q is intended to generate output that conforms to the datatype abc:PurchaseOrder (for some suitable binding of the namespace prefix abc). The query Q can be type-validated by "wrapping" it in a function that takes an instance of the desired type as a parameter and simply returns it, as in the following example:

(Q30) Validate that the result of a query Q conforms to the datatype abc:PurchaseOrder (Q may be arbitrarily complex).
DEFINE FUNCTION check(abc:PurchaseOrder $po) RETURNS abc:PurchaseOrder
   { $po }

2.11 Operations on Datatypes

InstanceofExpr   ::=   Expr "instanceof" "only"? Datatype
TypeSwitchExpr   ::=   "typeswitch" "(" Expr ")" ("as" Variable)?
           CaseClause+ "default" "return" Expr
CaseClause   ::=   "case" Datatype "return" Expr
CastExpr   ::=   (("cast" "as") | ("treat" "as")) Datatype "(" Expr ")"
Datatype   ::=   QName

The Boolean operator INSTANCEOF returns True if its first operand is an instance of the type named in its second operand; otherwise it returns False. For example, $x INSTANCEOF zoonames:animal returns True if the dynamic type of $x is zoonames:animal or a subtype of zoonames:animal. If the keyword ONLY is specified, INSTANCEOF returns True only if the dynamic type of its first operand exactly matches the specified type, excluding subtypes from consideration. For example, $x INSTANCEOF ONLY zoonames:animal returns True if the dynamic type of $x is zoonames:animal but not a subtype of zoonames:animal.

The datatype that can be associated with an expression by static analysis of a query is called the static type of the expression. During execution, the actual value of an expression may have a type (called its dynamic type) that is a subtype of its static type. For example, a variable named $mypet whose static type is Animal may contain a value of type Duck, a subtype of Animal. The dynamic type of an expression may influence the processing of a query. For example, if the variable $mypet contains a value of dynamic type Duck, a query might invoke the function quack($mypet), which is defined only for values of type Duck. The typeswitch expression of XQuery allows users to write queries that are sensitive to dynamic type.

In a typeswitch expression, the TYPESWITCH keyword is followed by an expression enclosed in parentheses, called the operand expression. This is the expression whose dynamic type is being tested. The operand expression can be followed by an AS clause that defines a variable, called the operand variable, to represent the value of the operand expression. The remainder of the typeswitch expression consists of one or more CASE clauses and a DEFAULT clause.

Each CASE clause specifies the name of a type, which must be a subtype of the static type of the operand expression, followed by a RETURN expression. The effective case is the first CASE clause such that the value of the operand expression is an instance of the type named in the CASE clause. The value of the typeswitch expression is the value of the RETURN expression in the effective case. If the value of the operand expression is not an instance of any type named in a CASE clause, the value of the typeswitch expression is the value of the RETURN expression in the DEFAULT clause.

The RETURN expressions in CASE and DEFAULT clauses typically contain references to the operand variable (defined in the AS clause). If the operand expression consists of a single variable, it can serve as the operand variable and the AS clause can be omitted. The AS clause can also be omitted if the RETURN expressions do not contain references to the operand variable.

The following example shows how a typeswitch expression can be used to simulate a primitive form of subtype polymorphism. A more robust treatment of polymorphic functions is deferred to a later version of XQuery.

(Q31) Define a function named sound(animal) that returns different strings for various types of animals. Use the function in a query that returns the sounds made by all of Billy's pets.
# First define some functions to set the stage
NAMESPACE xsd = "http://www.w3.org/2001/XMLSchema"
DEFAULT NAMESPACE = "http://www.abc.example.com/names"
SCHEMA "http://www.abc.example.com/names" 

DEFINE FUNCTION quack(duck $d) RETURNS xsd:string
   { "String depends on properties of duck" }

DEFINE FUNCTION woof(dog $d) RETURNS xsd:string
   { "String depends on properties of dog" }

# This function illustrates simulated subtype polymorphism

DEFINE FUNCTION sound(animal $a) RETURNS xsd:string
      CASE duck RETURN quack($a)
      CASE dog RETURN woof($a)
      DEFAULT RETURN "No sound"

# This query returns the sounds made by all of Billy's pets

FOR $p IN //kid[name="Billy"]/pet
RETURN sound($p)

Ed. Note: If subtype substitutability is not supported in XQuery Version 1, the motivation for TYPESWITCH is weakened and the decision to support TYPESWITCH should be revisited. See issue xquery-subtype-substitutability in Appendix C.

Occasionally it is necessary to explicitly convert a value from one datatype to another. For the primitive and derived types of XML Schema, a CAST notation is supported that provides conversions between certain combinations of types. For example, the notation CAST AS xsd:integer (x DIV y) converts the result of x DIV y into the xsd:integer type. The set of type conversions that are supported by the CAST operator are specified in XQuery Functions and Operators (to appear.) Conversions among user-defined datatypes are not supported by the CAST notation, but user-defined functions can be written for this purpose.

In addition to CAST, XQuery provides a notation called TREAT. Rather than converting an expression from one datatype to another, TREAT causes the query processor to treat an expression as though its datatype were a subtype of its static type. For example, TREAT AS Cat($mypet) tells the query processor to treat the variable $mypet as though it were an instance of the type Cat, even though the static type of $mypet is a supertype of Cat such as Animal. This notation allows functions that require an argument of type Cat to be invoked on the variable $mypet. At query execution time, if the dynamic type of $mypet is not Cat, an error results.

Ed. Note: The functionality of TREAT can also be expressed using TYPESWITCH. A proposal to remove the TREAT expression is under consideration. See issue xquery-remove-treat in Appendix C.

2.12 Structure of a Query Module

The XQuery language consists of units called query modules. Each query module is independent, but for convenience, multiple query modules, separated by semicolons, can be parsed together. For example, a test suite for an XQuery parser might consist of a file containing several query modules.

Ed. Note: The definition and syntax of a query module are still under discussion in the working group. The specifications in this section are pending approval by the working group. See issue xquery-module-syntax in Appendix C.

QueryModuleList   ::=   QueryModule ( ";" QueryModule)*
QueryModule   ::=   ContextDecl* FunctionDefn* ExprSequence?
ContextDecl   ::=   ("namespace" NCName "=" StringLiteral)
| ("default" "namespace" "=" StringLiteral)
| ("schema" StringLiteral StringLiteral)
Expr   ::=   SortExpr
| OrExpr
| AndExpr
| BeforeAfterExpr
| FLWRExpr
| IfExpr
| SomeExpr
| EveryExpr
| TypeSwitchExpr
| EqualityExpr
| RelationalExpr
| InstanceofExpr
| RangeExpr
| AdditiveExpr
| MultiplicativeExpr
| UnaryExpr
| UnionExpr
| IntersectExceptExpr
| PathExpr

The first part of a query module consists of declarations of namespace prefixes and external schemas that are used in the remainder of the query module. Each namespace prefix used in a query module must be defined in a namespace declaration that associates it with a Universal Resource Identifier (URI), as in the following example:

(Q32) In the document "zoo.xml", find <tiger> elements in the http://www.abc.example.com/names namespace that contain any subelement in the http://www.xyz.example.com/names namespace.
NAMESPACE abc = "http://www.abc.example.com/names"
NAMESPACE xyz = "http://www.xyz.example.com/names"

A namespace declaration can also be used to declare a default namespace that applies to all unqualified element names used in a query, as shown in the following example:

(Q32, alternate form) (Equivalent to Q32)
DEFAULT NAMESPACE = "http://www.abc.example.com/names"
NAMESPACE xyz = "http://www.xyz.example.com/names"

If no default namespace is declared for a query, names without prefixes used in the query module match nodes with unqualified names.

A query module may also contain schema declarations, which specify the URIs of schemas that are associated with particular namespaces. In the following example, each of the namespace declarations is followed by a schema declaration giving the URI of the schema that is associated with the namespace:

(Q32, second alternate form) (Equivalent to Q32, with explicit schemas)
DEFAULT NAMESPACE = "http://www.abc.example.com/names"

SCHEMA "http://www.abc.example.com/names" 

NAMESPACE xyz = "http://www.xyz.example.com/names"
SCHEMA "http://www.xyz.example.com/names" 


Following the namespace and schema declarations, a query module may contain one or more function definitions. Each function defined in a query module can be invoked anywhere in that query module. A query module that consists only of declarations and function definitions is called a function library.

Ed. Note: The means by which a query module gains access to the functions defined in an external function library remains to be defined. See issue xquery-import in Appendix C.

Following the declarations and function definitions, a query module may contain an expression or sequence of expressions that represents the query itself. The result of the query is the value of this expression or sequence.

3 Example Applications

This section describes several interesting types of queries that can be expressed using the syntax introduced in the previous section.

3.1 Filtering

One of the functions in the XQuery core function library is called filter. This function takes a single parameter which can be any expression. The function evaluates its argument and returns a shallow copy of the nodes that are selected by the argument, preserving any relationships that exist among these nodes. For example, suppose that the argument to filter is a path expression that selects nodes X, Y, and Z from some document. Suppose that, in the original document, nodes Y and Z are descendants (at any level) of node X. Then the result of filter is a copy of node X, with copies of nodes Y and Z as its immediate children. Any other intervening nodes from the original document are not present in the result. The name filter suggests a function that operates on a document to extract the parts that are of interest and discard the remainder, while retaining the structure of the original document.

The semantics of filter are illustrated by Figure 2. Suppose that the left side of Figure 2 represents a node hierarchy that is bound to the variable $doc. The right side of Figure 2 shows the result of the function filter($doc // (A | B)). The result contains copies of all nodes of type A and B in the original hierarchy, with their original relationships preserved. Note that the action of the filterfunction may split a node hierarchy into multiple hierarchies (preserving the sequential relationships among the root nodes of the resulting hierarchies.)

Action of the filter function
Figure 2: Action of the filter function

The following example illustrates how filter might be used to compute a table of contents for a document that contains many levels of nested sections. The query filters the document, retaining only section elements, title elements nested directly inside section elements, and the text of those title elements. Other elements, such as paragraphs and figure titles, are eliminated, leaving only the "skeleton" of the document.

(Q33) Prepare a table of contents for the document "cookbook.xml", containing nested sections and their titles.
   filter(document("cookbook.xml") //
      (section | section/title | section/title/text()))

3.2 Joins

Joins, which combine data from multiple sources into a single result, are a very important type of query. In this section we will illustrate how several types of joins can be expressed in XQuery. We will base our examples on the following three documents:

  1. A document named "parts.xml" that contains many <part> elements; each <part> element in turn contains <partno> and <description> subelements.

  2. A document named "suppliers.xml" that contains many <supplier> elements; each <supplier> element in turn contains <suppno> and <suppname> subelements.

  3. A document named "catalog.xml" that contains information about the relationships between suppliers and parts. The catalog document contains many <item> elements, each of which in turn contains <partno>, <suppno>, and <price> subelements.

A conventional ("inner") join returns information from two or more related sources, as illustrated by the following example, which combines information from three documents:

(Q34) Generate a "descriptive catalog" derived from the catalog document, but containing part descriptions instead of part numbers and supplier names instead of supplier numbers. Order the new catalog alphabetically by part description and secondarily by supplier name.
     FOR $i IN document("catalog.xml")//item,
         $p IN document("parts.xml")//part[partno = $i/partno],
         $s IN document("suppliers.xml")//supplier[suppno = $i/suppno]
     SORTBY(description, suppname)

Q34 returns information only about parts that have suppliers and suppliers that have parts. An "outer join" is a join that preserves information from one or more of the participating sources, including elements that have no matching element in the other source. For example, a "left outer join" between suppliers and parts might return information about suppliers that have no matching parts. Q35 is an example of a left outer join.

(Q35) Return names of all the suppliers in alphabetic order, including those that supply no parts; inside each supplier element, list the descriptions of all the parts it supplies, in alphabetic order.
FOR $s IN document("suppliers.xml")//supplier
        FOR $i IN document("catalog.xml")//item
                 [suppno = $s/suppno],
            $p IN document("parts.xml")//part
                 [partno = $i/pno]
        RETURN $p/description 

Q35 preserves information about suppliers that supply no parts. Another type of join, called a "full outer join", might be used to preserve information about both suppliers that supply no parts and parts that have no supplier. The result of a full outer join can be structured in any of several ways. The example in Q36 uses a format of parts nested inside suppliers, followed by a sequence of parts that have no supplier. This might be thought of as a "supplier-centered" full outer join. A "part-centered" full outer join, on the other hand, might return a sequence of suppliers nested inside parts, followed by a sequence of suppliers that have no parts. Other forms of outer join queries are also possible.

(Q36) Return names of suppliers and descriptions and prices of their parts, including suppliers that supply no parts and parts that have no suppliers.
    FOR $s IN document("suppliers.xml")//supplier
             FOR $i IN document("catalog.xml")//item
                     [suppno = $s/suppno],
                 $p IN document("parts.xml")//part
                     [partno = $i/partno]
             SORTBY (description)
    # parts that have no supplier
       { FOR $p IN document("parts.xml")//part
         WHERE empty(document("catalog.xml")//item
               [partno = $p/partno] )
         RETURN $p/description 

Q36 uses an element constructor to enclose its output inside a <master_list> element. The concatenation operator (",") is used to combine the two main parts of the query. The result is an ordered sequence of <supplier> elements followed by an <orphan_parts> element that contains descriptions of all the parts that have no supplier.

3.3 Grouping

Many queries involve forming data into groups and applying some aggregation function such as count or avg to each group. The following example shows how such a query might be expressed in XQuery, using the catalog document defined in the previous section:

(Q37) Find the part number and average price for parts that have at least 3 suppliers.
FOR $pn IN distinct(document("catalog.xml")//partno)
LET $i := document("catalog.xml")//item[partno = $pn]
WHERE count($i) >= 3
      <avgprice> {avg($i/price)} </avgprice>

The distinct function in this query eliminates duplicate part numbers from the set of all part numbers in the catalog document. The result of distinct is a sequence in which order is not significant.

Note that $pn, bound by a FOR-clause, represents an individual part number, whereas $i, bound by a LET-clause, represents a set of items which serves as argument to the aggregate functions count($i) and avg($i/price). The query uses an element constructor to enclose each part number and average price in a containing element called <well_supplied_item>.

4 Conclusion

With the emergence of XML, the distinctions among various forms of information, such as documents and databases, are quickly disappearing. XQuery is designed to support queries against a broad spectrum of information sources. The versatility of XQuery will help XML to realize its potential as a universal medium for data interchange.

This specification describes XQuery Version 1. Future versions of XQuery may include additional features such as the following:

  1. Data definition facilities for persistent views.

  2. Function overloading and polymorphic functions.

  3. Facilities for updating XML data.

  4. An extensibility mechanism whereby function libraries can be created, containing functions implemented in various programming languages.

B XQuery Grammar

This appendix contains an LL(1) grammar for XQuery. We will be collecting grammars for specific parser generators, executable parsers, and collections of queries from the XML Query documents on the XML Query Working Group's public home page (http://www.w3.org/XML/Query).

[1]   QueryModuleList   ::=   QueryModule ( ";" QueryModule)*
[2]   QueryModule   ::=   ContextDecl* FunctionDefn* ExprSequence?
[3]   ContextDecl   ::=   ("namespace" NCName "=" StringLiteral)
| ("default" "namespace" "=" StringLiteral)
| ("schema" StringLiteral StringLiteral)
[4]   FunctionDefn   ::=   "define" "function" QName "(" ParamList? ")"
           ("returns" Datatype)? EnclosedExpr
[5]   ParamList   ::=   Param ("," Param)*
[6]   Param   ::=   Datatype? Variable
[7]   Expr   ::=   SortExpr
| OrExpr
| AndExpr
| BeforeAfterExpr
| FLWRExpr
| IfExpr
| SomeExpr
| EveryExpr
| TypeSwitchExpr
| EqualityExpr
| RelationalExpr
| InstanceofExpr
| RangeExpr
| AdditiveExpr
| MultiplicativeExpr
| UnaryExpr
| UnionExpr
| IntersectExceptExpr
| PathExpr
[8]   SortExpr   ::=   Expr "sortby" "(" SortSpecList ")"
[9]   SortSpecList   ::=   Expr ("ascending" | "descending")? ("," SortSpecList)?
[10]   OrExpr   ::=   Expr "or" Expr
[11]   AndExpr   ::=   Expr "and" Expr
[12]   BeforeAfterExpr   ::=   Expr ("before" | "after") Expr
[13]   FLWRExpr   ::=   (ForClause | LetClause)+ WhereClause? "return" Expr
[14]   ForClause   ::=   "for" Variable "in" Expr ("," Variable "in" Expr)*
[15]   LetClause   ::=   "let" Variable ":=" Expr ("," Variable ":=" Expr)*
[16]   WhereClause   ::=   "where" Expr
[17]   IfExpr   ::=   "if" "(" Expr ")" "then" Expr "else" Expr
[18]   SomeExpr   ::=   "some" Variable "in" Expr "satisfies" Expr
[19]   EveryExpr   ::=   "every" Variable "in" Expr "satisfies" Expr
[20]   TypeSwitchExpr   ::=   "typeswitch" "(" Expr ")" ("as" Variable)?
           CaseClause+ "default" "return" Expr
[21]   CaseClause   ::=   "case" Datatype "return" Expr
[22]   EqualityExpr   ::=   Expr ("=" | "!=" | "==" | "!==") Expr
[23]   RelationalExpr   ::=   Expr ("<" | "<=" | ">" | ">=") Expr
[24]   InstanceofExpr   ::=   Expr "instanceof" "only"? Datatype
[25]   RangeExpr   ::=   Expr "to" Expr
[26]   AdditiveExpr   ::=   Expr ("+" | "-") Expr
[27]   MultiplicativeExpr   ::=   Expr ("*" | "div" | "mod") Expr
[28]   UnaryExpr   ::=   ("-" | "+") Expr
[29]   UnionExpr   ::=   Expr ("union" | "|") Expr
[30]   IntersectExceptExpr   ::=   Expr ("intersect" | "except") Expr
[31]   PathExpr   ::=   RelativePathExpr
| ("/" RelativePathExpr?)
| ("//" RelativePathExpr?)
[32]   RelativePathExpr   ::=   StepExpr ( ("/" | "//") StepExpr)*
[33]   StepExpr   ::=   AxisStepExpr | OtherStepExpr
[34]   AxisStepExpr   ::=   Axis NodeTest StepQualifiers
[35]   OtherStepExpr   ::=   PrimaryExpr StepQualifiers
[36]   StepQualifiers   ::=   ( ("[" Expr "]") | ("=>" NameTest) )*
[37]   Axis   ::=   (NCName "::") | "@"
[38]   PrimaryExpr   ::=   "."
| ".."
| NodeTest
| Variable
| Literal
| FunctionCall
| ParenthesizedExpr
| CastExpr
| ElementConstructor
[39]   Literal   ::=   NumericLiteral | StringLiteral
[40]   NodeTest   ::=   NameTest | KindTest
[41]   NameTest   ::=   QName | Wildcard
[42]   KindTest   ::=   PITest | CommentTest | TextTest | AnyKindTest
[43]   PITest   ::=   "processing-instruction" "(" StringLiteral? ")"
[44]   CommentTest   ::=   "comment" "(" ")"
[45]   TextTest   ::=   "text" "(" ")"
[46]   AnyKindTest   ::=   "node" "(" ")"
[47]   ParenthesizedExpr   ::=   "(" ExprSequence? ")"
[48]   ExprSequence   ::=   Expr ("," Expr)*
[49]   FunctionCall   ::=   QName "(" (Expr ("," Expr)*)? ")"
[50]   CastExpr   ::=   (("cast" "as") | ("treat" "as")) Datatype "(" Expr ")"
[51]   Datatype   ::=   QName
[52]   ElementConstructor   ::=   "<" NameSpec AttributeList ("/>" |
           (">" ElementContent* "</" (QName S?)? ">") )
[53]   NameSpec   ::=   QName | ( "{" Expr "}" )
[54]   AttributeList   ::=   (S (NameSpec S? "=" S? (AttributeValue
           | EnclosedExpr) AttributeList)? )?
[55]   AttributeValue   ::=   ( ["] AttributeValueContent* ["] )
| ( ['] AttributeValueContent* ['] )
[56]   ElementContent   ::=   Char
| ElementConstructor
| EnclosedExpr
| CdataSection
| CharRef
| PredefinedEntityRef
[57]   AttributeValueContent   ::=   Char
| CharRef
| EnclosedExpr
| PredefinedEntityRef
[58]   CdataSection   ::=   "<![CDATA[" Char* "]]>"
[59]   EnclosedExpr   ::=   "{" ExprSequence "}"

Precedence order of expressions, from highest precedence to lowest precedence (within each precedence level, operators are applied from left to right):

Lexical structure
[60]   StringLiteral   ::=   (["] [^"]* ["]) | (['] [^']* ['])
[61]   NumericLiteral   ::=   (("." [0-9]+) | ([0-9]+ ("." [0-9]+?)?)) ([eE] [+-]? [0-9]+)?
[62]   Wildcard   ::=   "*" | (NCName ":") | ("*:" NCName)
[63]   Variable   ::=   "$" QName
[64]   PredefinedEntityRef   ::=   "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";"
[65]   CharRef   ::=   "&#" ([0-9]+ | ("x" [0-9a-fA-F]+)) ";"

Char and S are defined in [XML 1.0]. QName and NCName are defined in [Namespaces].

