Chapter 2. Language reference
XProc is a data flow programming language described with an XML
vocabulary. This chapter provides an overview of the features of the language.
At a high level, XProc allows you to combine steps, units of computation, in a
variety of ways to achieve your goal. The p:for-each
step, for
example, will iterate over a set of documents and the p:xslt
step will
perform XSLT transformations.
Broadly speaking, the features are:
Structures for declaring pipelines,
structures for connecting steps to inputs,
compound steps,
atomic steps,
options,
variables,
and extra information
The following sections give a brief overview of the elements in the XProc vocabulary.
This section is something of a work-in-progress. At the moment, it’s neither a comprehensive description of every aspect of the vocabulary, nor is it tutorial in nature. But it’s useful to have every element in the vocabulary present in the reference. Suggestions for improvements are welcome.
In the summaries that follow, {any-name}*
generally means
any number of additional namespace qualified names. These are
roughly extension attributes and are ignored unless the processor uses them for
some implementation-defined purpose.
2.1. Declaring pipelines
The most common pipeline declaration specifies the inputs, outputs, and options that the step accepts, followed by the steps that implement the pipeline. Pipelines may also import libraries and functions, and may declare steps.
<p:declare-step | |
name? = NCName | The step name |
type? = EQName | The step type (for reuse) |
psvi-required? = boolean | Is XML Schema validated input required? |
xpath-version? = decimal | The XPath version required |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
version? = decimal | The XProc version (3.0 or 3.1) |
visibility? = private|public | Visible outside the library? |
{any-name}* = string | Additional attributes |
> | |
(import | import-functions)*, (input | output | option)*, declare-step*, subpipeline? | |
</p:declare-step> |
The example pipeline in Example 2.1, “A compound step declaration” shows a typical compound step declaration. In brief: it iterates over the files in a directory replacing selected copyright elements. We’ll look at several of its features in more detail below.
name
The step name is only used by the subpipeline in the declaration. It’s how a step in the subpipeline can refer, for example with
p:pipe
, to one of the step’s inputs.1 |
<p:declare-step name="main">
|<p:input port="source"/>
|…
|<p:identity>
5 |<p:with-input>
|<p:pipe step="main" port="source"/>
|</p:with-input>
|</p:identity>
|…
10 |</p:declare-step>
type
The step type is how you reuse a step. If you declare a step with the type
ex:my-step
, then you can subsequently use it as an atomic step:<ex:my-step>
in other steps, even recursively.1 |
<p:declare-step name="main" xmlns:ex="http://example.com/ns">
|<p:input port="source"/>
|<p:declare-step type="ex:my-step">
5 |<p:input port="source"/>
|…
|</p:declare-step>
|…
|<ex:my-step>
10 |<p:with-input pipe="source@main"/>
|</ex:my-step>
|…
|</p:declare-step>
psvi-required
If this is true, you’re telling the processor that XML Schema validated inputs are required. This will require Saxon EE.
xpath-version
This specifies the XPath version. The only version that you can use today is “3.1”, but in the future, it might be possible to specify other versions.
exclude-inline-prefixes
When you put XML in a
p:inline
element, all of the in-scope namespaces will apply to those elements. You can useexclude-inline-prefixes
to exclude some of them. The value of the attribute must be a space-separated list of in-scope namespace prefixes. It’s an error to refer to a prefix that doesn’t have an in-scope declaration. The tokens#default
and#all
may also be used to exclude the default namespace and all namespaces, respectively.version
The XProc version of the step. Only 3.0 or 3.1 are accepted and they’re equivalent.
The example pipelines and some input documents to demonstrate how they work are available from an examples directory in the repository.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | name="main" version="3.1">
| <p:documentation>
| <div xmlns="http://www.w3.org/1999/xhtml">
| <p>This pipeline reads all of the files in a directory and
| updates the copyright element.</p>
10 | </div>
| </p:documentation>
|
| <p:input port="copyright" content-types="xml"/>
| <p:output port="result" content-types="text"/>
15 | <p:option name="path" required="true" as="xs:string"/>
| <p:option name="output-path" required="true" as="xs:anyURI"/>
| <p:option name="recurse" select="false()" as="xs:boolean"/>
|
| <p:directory-list name="listing" path="{$path}"
20 | include-filter=".*\.xml$">
| <p:with-option name="max-depth"
| select="if ($recurse) then 'unbounded' else '1'"/>
| </p:directory-list>
|
25 | <p:for-each name="loop">
| <p:with-input select="//c:file"/>
| <p:variable name="filename" select="/*/@name"/>
|
| <p:load href="{resolve-uri(/*/@name, base-uri(/*))}"/>
30 |
| <p:viewport match="copyright[. = 'Someone Random']">
| <p:identity>
| <p:with-input pipe="copyright@main"/>
| </p:identity>
35 | </p:viewport>
|
| <p:store href="{resolve-uri($filename, resolve-uri($output-path, static-base-uri()))}"/>
| </p:for-each>
|
40 | <p:variable name="total" select="count(//c:file)">
| <p:pipe step="listing"/>
| </p:variable>
|
| <p:identity>
45 | <p:with-input xmlns:f="http://example.com/ns/functions">
| <p:inline content-type="text/plain">Processed {$total} files; {f:is-leap-day()} </p:inline>
| </p:with-input>
| </p:identity>
|</p:declare-step>
A simpler form of declaration specifies the inputs, outputs, and options that the step accepts, but relies on the implementation having been provided through some other means. The XML Calabash extension steps, or extension steps that you write in a JVM language yourself, follow this pattern.
<p:declare-step | |
name? = NCName | The step name |
type? = EQName | The step type (for reuse) |
psvi-required? = boolean | Is XML Schema validated input required? |
xpath-version? = decimal | The XPath version required |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
version? = decimal | The XProc version (3.0 or 3.1) |
visibility? = private|public | Visible outside the library? |
{any-name}* = string | Additional attributes |
> | |
(input | output | option)* | |
</p:declare-step> |
An example atomic declaration is shown in Example 2.2, “An atomic step declaration”.
This is the declaration for the extension step cx:fileset
.
1 |<p:declare-step type="cx:fileset" version="3.1"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:xs="http://www.w3.org/2001/XMLSchema">
5 | <p:input port="source" content-types="xml" sequence="true">
| <p:empty/>
| </p:input>
| <p:output port="result" content-types="xml" sequence="true"/>
| <p:option name="path" as="xs:string" required="true"/>
10 | <p:option name="default-excludes" as="xs:boolean" select="true()"/>
| <p:option name="case-sensitive" as="xs:boolean" select="true()"/>
| <p:option name="error-on-missing-dir" as="xs:boolean" select="true()"/>
| <p:option name="follow-symlinks" as="xs:boolean" select="true()"/>
| <p:option name="includes" as="xs:string?"/>
15 | <p:option name="excludes" as="xs:string?"/>
| <p:option name="detailed" as="xs:boolean" select="false()"/>
|</p:declare-step>
2.1.1. Pipeline inputs
The p:input
element describes a step input.
<p:input | |
port = NCName | The port name |
sequence? = boolean | Accept a (possibly empty) sequence of documents? |
primary? = boolean | This is the primary port? |
select? = XPathExpression | XPath selection from the inputs |
content-types? = ContentTypes | Acceptable content types |
href? = { anyURI } | A document binding |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
> | |
((empty | (document | inline)*) | anyElement*) | |
</p:input> |
The input in Example 2.3, “A pipeline input” has the port name “copyright” and
accepts only XML documents. If it’s the only input, it will be primary. The input
doesn’t specify sequence="true"
, so a single document is required.
port
The port name is how you refer to an input or output port. Its name must be unique.
sequence
If a sequence is allowed, any number of documents can be used on that port. If not, exactly one document must be used.
primary
If a port is primary, that’s where implicit connections are made. If there’s only one input or output port, it will be primary be default. If there’s more than one, none are primary unless indicated explicitly. At most one input port and one output port may be declared primary.
select
A select expression on an input matches the selected nodes and creates an input document for each. Matching, for example,
//chapter
will make the input a sequence of documents, one for each chapter element that appears on the original input.content-types
If a list of content types is provided, only documents that are of those types are allowed. Note that XProc allows both positive and negative content types.
href
Shortcut for a single
p:document
binding.exclude-inline-prefixes
You can use
exclude-inline-prefixes
to exclude some namespaces from ap:inline
, as noted about the attributes ofp:declare-step
. If theexclude-inline-prefixes
element occurs multiple times among the ancestors of an inline, the effect is the union of all such prefixes.
|<p:input port="copyright" content-types="xml"/>
2.1.2. Pipeline outputs
The p:output
element describes a step output. There
are three slightly different forms, depending on where the output is being
declared. The pipeline and compound step forms are the same except that
on a pipeline, serialization parameters may also be declared.
<p:output | |
port? = NCName | The port name |
sequence? = boolean | Accept a (possibly empty) sequence of documents? |
primary? = boolean | This is the primary port? |
content-types? = ContentTypes | Acceptable content types |
href? = { anyURI } | A document binding |
pipe? = string | A pipe binding |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
serialization? = map(xs:QName,item()*) | Serialization options |
> | |
((empty | (document | pipe | inline)*) | anyElement*) | |
</p:output> |
The output in Example 2.4, “A pipeline output” has the port name “result”. The pipeline result must be a single text document.
The notes about the attributes on p:input
apply to p:output
.
In addition, p:output
has a pipe
attribute that
is a shortcut for p:pipe
bindings. On a p:declare-step
it
may also have a map of serialization parameters.
|<p:output port="result" content-types="text"/>
On a compound step, no serialization can occur, so it would be pointless to specify them.
<p:output | |
port? = NCName | The port name |
sequence? = boolean | Accept a (possibly empty) sequence of documents? |
primary? = boolean | This is the primary port? |
content-types? = ContentTypes | Acceptable content types |
href? = { anyURI } | A document binding |
pipe? = string | A pipe binding |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
> | |
((empty | (document | pipe | inline)*) | anyElement*) | |
</p:output> |
On the declaration of an atomic step, you also cannot provide any connections.
<p:output | |
port? = NCName | The port name |
sequence? = boolean | Accept a (possibly empty) sequence of documents? |
primary? = boolean | This is the primary port? |
content-types? = ContentTypes | Acceptable content types |
{any-name}* = string | Additional attributes |
/> |
The output on the cx:fileset
declaration indicates that
the step can produce a sequence of XML documents.
|<p:output port="result" content-types="xml" sequence="true"/>
2.1.3. Pipeline options
Options are declared with the p:option
element.
<p:option | |
name = EQName | The option name |
as? = XPathSequenceType | The required value type |
values? = string | Allowed values |
static? = boolean | Static value? |
required? = boolean | Option value is required? |
select? = XPathExpression | Default value |
{any-name}* = string | Additional attributes |
visibility? = private|public | Visible outside the library? |
/> |
Options can be required or have a default value and they can be typed.
name
The option name. Options cannot shadow earlier options, they must have unique names. Non-static options can be shadowed in the subpipeline by
p:variable
s.as
The type of the option, for example “
xs:integer
” or “map(xs:string, xs:dateTime)
”.values
A list of atomic values. This forms an enumeration and the option must be one of these values.
static
Is the option evaluated at compile time? Static options can be used in
use-when
expressions for conditional element exclusion.required
If an attribute is required, the caller must provide a value for it. If it isn’t required, and no value is provided, the default value is taken from the select attribute. (If there’s no select attribute, the default value is the empty sequence.)
select
Provides a default value for the option. Option default values can refer to preceding options, but not to the step inputs.
The compound step declaration in Example 2.1, “A compound step declaration” declares three options:
|<p:option name="path" required="true" as="xs:string"/>
|<p:option name="output-path" required="true" as="xs:anyURI"/>
|<p:option name="recurse" select="false()" as="xs:boolean"/>
The path
and output-path
options are required and must be
a string and a URI, respectively (there’s no practical reason to make them different types
in this case, it’s just to make the example more interesting).
The recurse
option is not required and will default to “false”.
Options can be declared static:
<:option | |
name = EQName | The option name |
as? = XPathSequenceType | The required value type |
values? = string | Allowed values |
static = "true" | Static value? |
select = XPathExpression | Default value |
{any-name}* = string | Additional attributes |
visibility? = private|public | Visible outside the library? |
/> |
Options inside a p:library
must be declared static.
It must be possible to evaluate a static option without reference to any pipeline
input documents. It is evaluated “at compile time”. You may not shadow a static option
with another option or p:variable
.
2.1.4. Declaring libraries of steps
Several steps (and static options) can be bundled together in a library.
<p:library | |
psvi-required? = boolean | Is XML Schema validated input required? |
xpath-version? = decimal | The XPath version required |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
version? = decimal | The XProc version (3.0 or 3.1) |
{any-name}* = string | Additional attributes |
> | |
(import | import-functions)*, option*, declare-step* | |
</p:library> |
A collection of step declarations and (static) options can be put inside
a p:library
so that they can all be imported together.
The notes about the attributes on p:declare-step
apply to
p:library
.
2.1.5. Importing libraries and function libraries
A library (or a single step) can be imported.
<p:import | |
{any-name}* = string | Additional attributes |
href = anyURI | Document URI |
/> |
The document URI must identify a pipeline or library document.
XML Calabash provides URIs for importing its extension steps. For example,
https://xmlcalabash.com/ext/library/fileset.xpl
for the
cx:fileset
extension step. Generally speaking, the specified URI is
retrieved and parsed for its declarations. The library URIs that XML Calabash
provides for its extension steps are resolved internally, without accessing the
internet.
XML Calabash can also import functions defined in XSLT or XQuery.
<p:import-functions | |
{any-name}* = string | Additional attributes |
href = anyURI | Document URI |
content-type? = ContentType | The content type |
namespace? = string | The namespace(s) to import |
/> |
Importing functions allows them to be used in expressions in the pipeline.
href
The document URI must identify a library of functions.
content-type
If a content-type is provided, it informs the processor what is expected from the imported library.
namespace
A whitespace-separated list of namespace URIs. If provided, only functions declared in one of those namespaces will be imported.
The stylesheet in Example 2.7, “A function library in XSLT” defines
two functions, f:is-leap-day
with no arguments and
f:is-leap-day
with a single date argument.
1 |<?xml version="1.0" encoding="utf-8"?>
|<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| xmlns:f="http://example.com/ns/functions"
5 | exclude-result-prefixes="xs"
| version="3.0">
|
|<xsl:function name="f:is-leap-day">
| <xsl:sequence select="f:is-leap-day(current-date())"/>
10 |</xsl:function>
|
|<xsl:function name="f:is-leap-day">
| <xsl:param name="date"/>
| <xsl:choose>
15 | <xsl:when test="$date instance of xs:date">
| <xsl:sequence select="month-from-date($date) = 2
| and day-from-date($date) = 29"/>
| </xsl:when>
| <xsl:when test="$date instance of xs:dateTime">
20 | <xsl:sequence select="month-from-dateTime($date) = 2
| and day-from-dateTime($date) = 29"/>
| </xsl:when>
| <xsl:when test="$date castable as xs:date">
| <xsl:variable name="dt" select="$date cast as xs:date"/>
25 | <xsl:sequence select="month-from-date($dt) = 2
| and day-from-date($dt) = 29"/>
| </xsl:when>
| <xsl:when test="$date castable as xs:dateTime">
| <xsl:variable name="dt" select="$date cast as xs:dateTime"/>
30 | <xsl:sequence select="month-from-date($dt) = 2
| and day-from-date($dt) = 29"/>
| </xsl:when>
| <xsl:otherwise>
| <xsl:sequence select="false()"/>
35 | </xsl:otherwise>
| </xsl:choose>
|</xsl:function>
|
|</xsl:stylesheet>
After the library has been imported, the functions that it defines can be used in XPath expressions in the pipeline. The pipeline in Example 2.8, “Using the function library” outputs a single document that answers the questions, “is today a leap day and is 29 February 2028 a leap day?”
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:f="http://example.com/ns/functions"
| name="main" version="3.1">
| <p:documentation>
5 | <div xmlns="http://www.w3.org/1999/xhtml">
| <p>Example of importing functions. This requires Saxon EE.</p>
| </div>
| </p:documentation>
|
10 | <p:import-functions href="is-leap-day.xsl"/>
|
| <p:output port="result" serialization="map{'indent':true()}"/>
|
| <p:identity>
15 | <p:with-input exclude-inline-prefixes="#all">
| <leap-days>
| <today date="{substring(string(current-date()), 1, 10)}"
| >{f:is-leap-day()}</today>
| <other date="2028-02-29"
20 | >{f:is-leap-day('2028-02-29')}</other>
| </leap-days>
| </p:with-input>
| </p:identity>
|</p:declare-step>
If you have Saxon EE and you run the pipeline, the output will be something like:
<leap-days>
<today date="2025-07-26">false</today>
<other date="2028-02-29">true</other>
</leap-days>
Note that the p:output
element in the pipeline uses the serialization
options to pretty-print the output and the p:with-input
uses
exclude-inline-prefixes
to avoid having the namespace
declaration for “f” on the output.
2.2. Connecting steps to inputs
Atomic and compound steps use p:with-input
to describe how
their inputs are connected.
<p:with-input | |
port? = NCName | The port name |
select? = XPathExpression | XPath selection from the inputs |
href? = { anyURI } | Document URI |
pipe? = string | A pipe binding |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
> | |
((empty | (document | pipe | inline)*) | anyElement*) | |
</p:with-input> |
The notes about the attributes on p:input
apply to p:with-input
.
The input on the p:for-each
step in the example above does
not have any explicit bindings:
|<p:with-input select="//c:file"/>
That means it connects to the “default readable port”, usually the primary output of the preceding step.
2.2.1. Document inputs
The p:document
element (or the href
attribute
on p:with-input
) connects an input to document identified with a URI.
<p:document | |
href = { anyURI } | Document URI |
content-type? = string | The required content type |
document-properties? = map(xs:QName,item()*) | Document properties map |
parameters? = map(xs:QName,item()*) | Parameters map |
{any-name}* = string | Additional attributes |
/> |
The document properties will be applied to the result. The parameters me be used during document access.
href
The document will be retrieved from this URI.
content-type
Identifies the (required) content type of the document, for example
application/json
ortext/plain
. If not specified, the content type is inferred from the URI.document-properties
Document properties are name/value pairs associated with the document. Unqualified property names, like
base-uri
are defined by the XProc specification. You can add arbitrary namespace qualified properties.parameters
The
p:document
instruction is defined in terms of thep:load
step. The parameters passed to it can be used by the load step to aid in retrieving the document. For example, a username and password might be passed as parameters.
2.2.2. Inline inputs
The p:inline
element connects an input to a document placed
into the pipeline directly.
<p:inline | |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
content-type? = string | The content type of the inline |
document-properties? = map(xs:QName,item()*) | Document properties map |
encoding? = string | Encoded content (base64) |
> | |
anyNode* | |
</p:inline> |
The p:inline
element can be omitted in the simple case of a
single XML document.
content-type
Identifies the (required) content type of the content, for example
application/json
ortext/plain
. If not specified, the content type is assumed to be XML.document-properties
Document properties are name/value pairs associated with the document. Unqualified property names, like
base-uri
are defined by the XProc specification. You can add arbitrary namespace qualified properties.encoding
The only supported value for encoding is
base64
. This encoding allows an inline to be in a different character set than the XML or to contain non-XML characters.
This p:inline
element specifies that it is text. Attribute
value templates in the body are used to evaluate expressions.
|<p:inline content-type="text/plain">Processed {$total} files; {f:is-leap-day()} </p:inline>
2.2.3. Pipe inputs
The p:pipe
element (or the pipe
attribute
on p:with-input
) connects an input to the output from some other step.
If step
is omitted, the step associated with the default
readable port is assumed. If port
is omitted, the primary output
port of the step is assumed. (Consequently, <p:pipe/>
is a connection
to the default readable port.) It’s an error to attempt to refer to the default readable
port if there isn’t one.
<p:pipe | |
step? = NCName | The step name |
port? = NCName | The port name |
{any-name}* = string | Additional attributes |
/> |
The p:identity
step in the viewport users a pipe
attribute to make a pipe binding:
|<p:with-input pipe="copyright@main"/>
The variable declaration for $total
uses the p:pipe
step to
make a pipe binding. In the context of a variable, this establishes the context item used
when evaluating the expression.
|<p:pipe step="listing"/>
2.2.4. Empty inputs
The p:empty
element explicitly binds the input to an empty sequence
of documents.
<p:empty | |
{any-name}* = string | Additional attributes |
/> |
2.3. Connecting option values to steps
In much the same way as inputs are provided using p:with-input
, option values
are provided using p:with-option
.
<p:with-option | |
name = EQName | The option name |
as? = XPathSequenceType | The required value type |
select = XPathExpression | Option value |
collection? = boolean | Inputs as default collection? |
href? = { anyURI } | Document URI |
pipe? = string | A pipe binding |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
> | |
((empty | (document | pipe | inline)*) | anyElement*) | |
</p:with-option> |
Options can also be specified as attributes
on the step itself, in which case the attribute name is the option name and its
value is interprted as an attribute value template. For example,
in Example 2.1, “A compound step declaration”, the $path
and
$include-filter
options are set this way:
|<p:directory-list name="listing" path="{$path}"
| include-filter=".*\.xml$">
name
The option name. This must be the name of an option declared for the step on which it is used.
as
The type of the option, for example “
xs:integer
” or “map(xs:string, xs:dateTime)
”.select
The select expression is evaluated to provide a value for the option.
collection
If the collection attribute is true, all of the documents that appear on the context binding are placed in the default collection for the expression. An expression can only refer to the context item if it is a single value, but by using the default collection, an option can handle a sequence of values.
href
Shortcut for a single
p:document
binding.pipe
Shortcut for one or more
p:pipe
bindings.exclude-inline-prefixes
When you put XML in a
p:inline
element, all of the in-scope namespaces will apply to those elements. You can useexclude-inline-prefixes
to exclude some of them. The value of the attribute must be a space-separated list of in-scope namespace prefixes. It’s an error to refer to a prefix that doesn’t have an in-scope declaration. The tokens#default
and#all
may also be used to exclude the default namespace and all namespaces, respectively.
In Example 2.1, “A compound step declaration”,
the $max-depth
option is set on the p:directory-list
using p:with-option
:
|<p:with-option name="max-depth"
| select="if ($recurse) then 'unbounded' else '1'"/>
2.4. Compound steps
The XProc specification defines several compound steps, steps that contain subpipelines. XML Calabash also implements a couple of additional compound steps. (Pipelines that use extension compound steps are probably not, strictly speaking, conformant with the specification.)
2.4.1. Looping over inputs (p:for-each)
The p:for-each
step loops over sequence of documents, processing each
with its subpipeline.
<p:for-each | |
name? = NCName | The step name |
{any-name}* = string | Additional attributes |
> | |
(with-input?, output*, subpipeline) | |
</p:for-each> |
Unlike XPath, if a select
expression is used
on the p:with-input
the nodes selected from the original document(s)
are not what the loop iterates over. Instead, a whole new document
is constructed for each selection and that document is processed.
This means, for example, that “@name
” can’t be used to test
an attribute on the loop input. You need to use “/*/@name
” or something
similar instead.
The p:for-each
in Example 2.9, “Looping with for-each” loops over the
c:file
elements from the p:directory-list
step, but each
one will be in its own document.
|<p:for-each name="loop">
| <p:with-input select="//c:file"/>
|…
|</p:for-each>
In principle, a p:for-each
can process its inputs in parallel.
XML Calabash does not do so at this time.
2.4.2. Changing internal structures (p:viewport)
The p:viewport
step replaces sections of a document with the result
of processing (a document containing) those sections using the subpipeline.
<p:viewport | |
name? = NCName | The step name |
match = XSLTSelectionPattern | Match pattern for content to replace |
{any-name}* = string | Additional attributes |
> | |
(with-input?, output?, subpipeline) | |
</p:viewport> |
The p:viewport
in Example 2.10, “Matching with viewport” matches
each copyright
element where the holder is “Someone Random” and replaces
that element with the result of its subpipeline, in this case, with the contents
of the document on the copyright input port.
1 |<p:viewport match="copyright[. = 'Someone Random']">
| <p:identity>
| <p:with-input pipe="copyright@main"/>
| </p:identity>
5 |</p:viewport>
Like select
on p:for-each
, the elements
matched by p:viewport
are available to the subpipeline as
new documents.
2.4.3. Choosing among alternatives (p:choose)
The p:choose
step allows you to select processing among a number
of alternatives. At most one alternative will be used: either the first p:when
(in document order) for which the test expression is true or the p:otherwise
.
<p:choose | |
name? = NCName | The step name |
{any-name}* = string | Additional attributes |
> | |
(with-input?, ((when+, otherwise?) | (when*, otherwise))) | |
</p:choose> |
The example pipeline in Example 2.11, “An example choice” uses
p:choose
to select which stylesheet to use when formatting a document
based on the document status.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | xmlns:ex="http://example.com/ns"
| name="main" version="3.1" type="ex:format">
| <p:input port="source"/>
| <p:output port="result"/>
|
10 | <p:choose>
| <p:when test="/*/@status = 'draft'">
| <p:xslt>
| <p:with-input port="stylesheet" href="draft.xsl"/>
| </p:xslt>
15 | </p:when>
| <p:when test="/*/@status = 'final'">
| <p:with-input pipe="source"/>
| <p:xslt>
| <p:with-input port="stylesheet" href="final.xsl"/>
20 | </p:xslt>
| </p:when>
| <p:otherwise>
| <p:error code="ex:bad-status">
| <p:with-input>
25 | <p:inline content-type="text/plain">Unexpected status</p:inline>
| </p:with-input>
| </p:error>
| </p:otherwise>
| </p:choose>
30 |</p:declare-step>
Each alternative is a p:when
which contains the subpipeline to run
if this alternative is selected.
<p:when | |
name? = NCName | The step name |
test = XPathExpression | The test expression |
collection? = boolean | Inputs as default collection? |
{any-name}* = string | Additional attributes |
> | |
(with-input?, output*, subpipeline) | |
</p:when> |
The first (and only the first) p:when
where the test expression evaluates
to true is used.
name
The step name is only used by the subpipeline in the
p:when
. It’s how a step in the subpipeline can refer, for example withp:pipe
, to the context input.test
The test expression. This
p:when
is selected if it is the firstp:when
, in document order, where the expression evaluates to true. It’s an error if the expression refers to the context item if there is not exactly one context item provided by thep:with-input
.collection
If the collection attribute is true, all of the documents that appear on the context binding are placed in the default collection for the expression. An expression can only refer to the context item if it is a single value, but by using the default collection, an option can handle a sequence of values.
The first p:when
selects documents with a status of “draft”:
1 |<p:when test="/*/@status = 'draft'">
| <p:xslt>
| <p:with-input port="stylesheet" href="draft.xsl"/>
| </p:xslt>
5 |</p:when>
The second p:when
selects documents with a status of “final”.
This example includes an explicit binding for the input that will be used to set
the context item for the test expression. It’s unnecessary as it’s the same as the default
readable port in this case.
1 |<p:when test="/*/@status = 'final'">
| <p:with-input pipe="source"/>
| <p:xslt>
| <p:with-input port="stylesheet" href="final.xsl"/>
5 | </p:xslt>
|</p:when>
The p:otherwise
contains the subpipeline to run
if no other alternative is selected.
<p:otherwise | |
name? = NCName | The step name |
{any-name}* = string | Additional attributes |
> | |
(output*, subpipeline) | |
</p:otherwise> |
The p:otherwise
in this example raises an error if the status is neither
draft nor final:
1 |<p:otherwise>
| <p:error code="ex:bad-status">
| <p:with-input>
| <p:inline content-type="text/plain">Unexpected status</p:inline>
5 | </p:with-input>
| </p:error>
|</p:otherwise>
2.4.4. Simple conditionals (p:if)
The p:if
is a simplified form of p:choose
. If the
test expression is true, then the subpipeline is run and that determines the output
from the step. If the expression is false, p:if
operates as an identity step,
passing its input through unchanged.
<p:if | |
name? = NCName | The step name |
test = XPathExpression | The test expression |
collection? = boolean | Inputs as default collection? |
{any-name}* = string | Additional attributes |
> | |
(with-input?, output*, subpipeline) | |
</p:if> |
The notes about the attributes on p:when
apply to p:if
.
The p:if
in Example 2.12, “Using an if instruction” deletes the
count
attribute if it has the value “0”.
|<p:if test="xs:integer(/doc/@count) = 0">
| <p:delete match="/doc/@count"/>
|</p:if>
If the attribute has any other value (or isn’t present), the document
passes through as if it was an p:identity
step.
2.4.5. Grouping (p:group)
The p:group
step is just a wrapper around a subpipeline.
<p:group | |
name? = NCName | The step name |
{any-name}* = string | Additional attributes |
> | |
(output*, subpipeline) | |
</p:group> |
2.4.6. Exception handling (p:try)
The p:try
step allows a pipeline author to catch errors
and recover from them. The subpipeline is run. If no errors occur, the result
of the step is the result of that subpipeline. If an error does occur, each
p:catch
is tested in turn and the result of the step is the result
of the first matching p:catch
.
<p:try | |
name? = NCName | The step name |
{any-name}* = string | Additional attributes |
> | |
(output*, subpipeline, ((catch+, finally?) | (catch*, finally))) | |
</p:try> |
The example for p:choose
raises an error if the status is
neither “draft” nor “final”. We can use p:try
to recover from that error
and treat any such document as if it had draft status, as shown in
Example 2.13, “An example try/catch”.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | xmlns:ex="http://example.com/ns"
| name="main" version="3.1">
| <p:import href="choose.xpl"/>
|
| <p:input port="source"/>
10 | <p:output port="result"/>
|
| <p:try>
| <ex:format/>
| <p:catch code="ex:bad-status">
15 | <p:xslt>
| <p:with-input port="source" pipe="source@main"/>
| <p:with-input port="stylesheet" href="draft.xsl"/>
| </p:xslt>
| </p:catch>
20 | </p:try>
|</p:declare-step>
A p:catch
matches an error if the error code is in its
code
list, or if it does not have a
code
attribute at all.
If there are no matching catches, the p:try
step fails with the
error (which may be caught and handled by some p:try
among its
ancestors, if it has any.)
<p:catch | |
name? = NCName | The step name |
code? = EQNameList | The error codes to catch |
{any-name}* = string | Additional attributes |
> | |
(output*, subpipeline) | |
</p:catch> |
Note that the default readable port inside the p:catch
is the
error document produced by the failed pipeline. You have to make an explicit binding
if you want something else.
Irrespective of whether the subpipeline succeeds or fails and whether or not
a catch is invoked (and whether or not it succeeds or fails), the p:finally
subpipeline will be run.
<p:finally | |
name? = NCName | The step name |
{any-name}* = string | Additional attributes |
> | |
(output*, subpipeline) | |
</p:finally> |
It is very uncommon for this to be useful. One plausible use case is for the finally step to clean up any side effects that might have been introduced by the subpipeline or the catch expressions, for example, deleting a temporary file or closing a database connection.
2.4.7. Loop until a condition is true (cx:until)
This is an extension
compound step that processes single documents,
applying its subpipeline until the
test
expression is true.
<cx:until | |
name? = NCName | The step name |
test = XPathExpression | The test expression |
{any-name}* = string | Additional attributes |
> | |
(with-input?, output?, subpipeline) | |
</cx:until> |
The test
attribute specifies an XPath
expression. The subpipeline is always run at least once and the condition is
only tested at the end of the loop.
The result of the subpipeline is provided as the context item.
The previous result is provided in the variable
cx:previous
.
The pipeline Example 2.14, “Looping until a condition is true” demonstrates a horribly inefficient way to add explicit numbers to a list.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | exclude-inline-prefixes="#all"
| name="main" version="3.1">
| <p:output port="result" serialization="map{'indent':true()}"/>
|
| <p:identity name="identity">
10 | <p:with-input>
| <list>
| <item/>
| <item/>
| <item/>
15 | </list>
| </p:with-input>
| </p:identity>
|
| <cx:until test="deep-equal(., $cx:previous)">
20 | <p:replace match="/list/item[1]">
| <p:with-input port="replacement">
| <li number="{p:iteration-position()}"/>
| </p:with-input>
| </p:replace>
25 | </cx:until>
|
|</p:declare-step>
The first time through the loop, the first item is replaced. The next time through, the next item is replaced, etc. Looping stops when nothing changes in the document (that is, when all the items have been replaced).
The result of the cx:until
step is first document for which the
test expression is true. In this case, the result is:
1 |<list>
| <li number="1"/>
| <li number="2"/>
| <li number="3"/>
5 |</list>
It is a dynamic error (err:XD0001
) if the source is not a
single document.
2.4.8. Loop while a condition is true (cx:while)
This is an extension
compound step that processes single documents,
applying its subpipeline while the
test
expression is true.
<cx:while | |
name? = NCName | The step name |
test = XPathExpression | The test expression |
{any-name}* = string | Additional attributes |
> | |
(with-input?, output?, subpipeline) | |
</cx:while> |
The somewhat contrived example in Example 2.15, “Looping while a condition is true” loops over the document adding a new first child until the count reaches zero.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | exclude-inline-prefixes="#all"
| name="main" version="3.1">
| <p:output port="result" serialization="map{'indent':true()}"/>
|
| <p:identity name="identity">
10 | <p:with-input>
| <doc count="3"/>
| </p:with-input>
| </p:identity>
|
15 | <cx:while test="/doc/@count and xs:integer(/doc/@count) gt 0">
| <p:insert position="first-child">
| <p:with-input port="insertion">
| <insertion for="{/doc/@count}"/>
| </p:with-input>
20 | </p:insert>
|
| <p:add-attribute attribute-name="count" attribute-value="{xs:integer(/doc/@count) - 1}"/>
|
| <p:if test="xs:integer(/doc/@count) = 0">
25 | <p:delete match="/doc/@count"/>
| </p:if>
| </cx:while>
|
|</p:declare-step>
The result of the cx:while
step is first document for which
the test expression did not have an effective boolean value of true. In this
case, the result is:
1 |<doc>
| <insertion for="1"/>
| <insertion for="2"/>
| <insertion for="3"/>
5 |</doc>
It is a
dynamic error (err:XD0001
) if the source is not a single
document.
The test
attribute specifies an
XPath expression. The document is provided as the context item. If the
expression is false, the loop is not run (or run again).
2.5. Atomic steps
A great many pipelines that you write will be like shell scripts or “main” functions in other programming languages: they run, they do a thing, and they end. But in fact, with a little extra markup, every pipeline that you declare can also be reused an atomic step elsewhere. In this way, there are an unbounded number of atomic steps: there are all of the standard ones (summarized in Part I, “Standard steps”), there are all of the extension steps that ship with XML Calabash (summarized in Part II, “Extension steps”), and then there are all the steps that you write.
2.6. Variables
A subpipeline may use p:variable
to hold the result of a
computation.
<p:variable | |
name = EQName | The variable name |
as? = XPathSequenceType | The required value type |
select = XPathExpression | Variable value |
collection? = boolean | Inputs as default collection? |
href? = { anyURI } | Document URI |
pipe? = string | A pipe binding |
{any-name}* = string | Additional attributes |
exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
> | |
((empty | (document | pipe | inline)*) | anyElement*) | |
</p:variable> |
The notes about the attributes on p:with-option
apply to p:variable
.
Expressions which occur later in the subpipeline may refer to the variable.
2.7. Extra information
Documentation can be placed anywhere in the pipeline with the p:documentation
element. It’s ignored by the processor.
<p:documentation | |
{any-name}* = string | Additional attributes |
> | |
any-well-formed-content* | |
</p:documentation> |
The p:pipeinfo
element is intended for additional information that
a particular processor might use. XML Calabash uses them for
assertions,
for example.
<p:pipeinfo | |
{any-name}* = string | Additional attributes |
> | |
any-well-formed-content* | |
</p:pipeinfo> |
The p:documentation
and p:pipeinfo
elements have no special
signifigance inside a p:inline
; in that context, they’re just inline
elements.