Chapter 2. Language reference
XProc is a data flow programming language described with an XML
vocabulary. This chapter provides an overview of the features of the language.
At a high level, XProc allows you to combine steps, units of computation, in a
variety of ways to achieve your goal. The p:for-each step, for
example, will iterate over a set of documents and the p:xslt step will
perform XSLT transformations.
Broadly speaking, the features are:
Structures for declaring pipelines,
structures for connecting steps to inputs,
compound steps,
atomic steps,
options,
variables,
and extra information
The following sections give a brief overview of the elements in the XProc vocabulary.
This section is something of a work-in-progress. At the moment, it’s neither a comprehensive description of every aspect of the vocabulary, nor is it tutorial in nature. But it’s useful to have every element in the vocabulary present in the reference. Suggestions for improvements are welcome.
In the summaries that follow, {any-name}* generally means
any number of additional namespace qualified names. These are
roughly extension attributes and are ignored unless the processor uses them for
some implementation-defined purpose.
2.1. Declaring pipelines
The most common pipeline declaration specifies the inputs, outputs, and options that the step accepts, followed by the steps that implement the pipeline. Pipelines may also import libraries and functions, and may declare steps.
| <p:declare-step | |
| name? = NCName | The step name |
| type? = EQName | The step type (for reuse) |
| psvi-required? = boolean | Is XML Schema validated input required? |
| xpath-version? = decimal | The XPath version required |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| version? = decimal | The XProc version (3.0 or 3.1) |
| visibility? = private|public | Visible outside the library? |
| {any-name}* = string | Additional attributes |
| > | |
| (import | import-functions)*, (input | output | option)*, declare-step*, subpipeline? | |
| </p:declare-step> |
The example pipeline in Example 2.1, “A compound step declaration” shows a typical compound step declaration. In brief: it iterates over the files in a directory replacing selected copyright elements. We’ll look at several of its features in more detail below.
nameThe step name is only used by the subpipeline in the declaration. It’s how a step in the subpipeline can refer, for example with
p:pipe, to one of the step’s inputs.1 |
<p:declare-step name="main">|<p:input port="source"/>|…|<p:identity>5 |<p:with-input>|<p:pipe step="main" port="source"/>|</p:with-input>|</p:identity>|…10 |</p:declare-step>typeThe step type is how you reuse a step. If you declare a step with the type
ex:my-step, then you can subsequently use it as an atomic step:<ex:my-step>in other steps, even recursively.1 |
<p:declare-step name="main" xmlns:ex="http://example.com/ns">|<p:input port="source"/>||<p:declare-step type="ex:my-step">5 |<p:input port="source"/>|…|</p:declare-step>|…|<ex:my-step>10 |<p:with-input pipe="source@main"/>|</ex:my-step>|…|</p:declare-step>psvi-requiredIf this is true, you’re telling the processor that XML Schema validated inputs are required. This will require Saxon EE.
xpath-versionThis specifies the XPath version. The only version that you can use today is “3.1”, but in the future, it might be possible to specify other versions.
exclude-inline-prefixesWhen you put XML in a
p:inlineelement, all of the in-scope namespaces will apply to those elements. You can useexclude-inline-prefixesto exclude some of them. The value of the attribute must be a space-separated list of in-scope namespace prefixes. It’s an error to refer to a prefix that doesn’t have an in-scope declaration. The tokens#defaultand#allmay also be used to exclude the default namespace and all namespaces, respectively.versionThe XProc version of the step. Only 3.0 or 3.1 are accepted and they’re equivalent.
The example pipelines and some input documents to demonstrate how they work are available from an examples directory in the repository.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | name="main" version="3.1">
| <p:documentation>
| <div xmlns="http://www.w3.org/1999/xhtml">
| <p>This pipeline reads all of the files in a directory and
| updates the copyright element.</p>
10 | </div>
| </p:documentation>
|
| <p:input port="copyright" content-types="xml"/>
| <p:output port="result" content-types="text"/>
15 | <p:option name="path" required="true" as="xs:string"/>
| <p:option name="output-path" required="true" as="xs:anyURI"/>
| <p:option name="recurse" select="false()" as="xs:boolean"/>
|
| <p:directory-list name="listing" path="{$path}"
20 | include-filter=".*\.xml$">
| <p:with-option name="max-depth"
| select="if ($recurse) then 'unbounded' else '1'"/>
| </p:directory-list>
|
25 | <p:for-each name="loop">
| <p:with-input select="//c:file"/>
| <p:variable name="filename" select="/*/@name"/>
|
| <p:load href="{resolve-uri(/*/@name, base-uri(/*))}"/>
30 |
| <p:viewport match="copyright[. = 'Someone Random']">
| <p:identity>
| <p:with-input pipe="copyright@main"/>
| </p:identity>
35 | </p:viewport>
|
| <p:store href="{resolve-uri($filename, resolve-uri($output-path, static-base-uri()))}"/>
| </p:for-each>
|
40 | <p:variable name="total" select="count(//c:file)">
| <p:pipe step="listing"/>
| </p:variable>
|
| <p:identity>
45 | <p:with-input xmlns:f="http://example.com/ns/functions">
| <p:inline content-type="text/plain">Processed {$total} files; {f:is-leap-day()} </p:inline>
| </p:with-input>
| </p:identity>
|</p:declare-step>A simpler form of declaration specifies the inputs, outputs, and options that the step accepts, but relies on the implementation having been provided through some other means. The XML Calabash extension steps, or extension steps that you write in a JVM language yourself, follow this pattern.
| <p:declare-step | |
| name? = NCName | The step name |
| type? = EQName | The step type (for reuse) |
| psvi-required? = boolean | Is XML Schema validated input required? |
| xpath-version? = decimal | The XPath version required |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| version? = decimal | The XProc version (3.0 or 3.1) |
| visibility? = private|public | Visible outside the library? |
| {any-name}* = string | Additional attributes |
| > | |
| (input | output | option)* | |
| </p:declare-step> |
An example atomic declaration is shown in Example 2.2, “An atomic step declaration”.
This is the declaration for the extension step cx:fileset.
1 |<p:declare-step type="cx:fileset" version="3.1"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:xs="http://www.w3.org/2001/XMLSchema">
5 | <p:input port="source" content-types="xml" sequence="true">
| <p:empty/>
| </p:input>
| <p:output port="result" content-types="xml" sequence="true"/>
| <p:option name="path" as="xs:string" required="true"/>
10 | <p:option name="default-excludes" as="xs:boolean" select="true()"/>
| <p:option name="case-sensitive" as="xs:boolean" select="true()"/>
| <p:option name="error-on-missing-dir" as="xs:boolean" select="true()"/>
| <p:option name="follow-symlinks" as="xs:boolean" select="true()"/>
| <p:option name="includes" as="xs:string?"/>
15 | <p:option name="excludes" as="xs:string?"/>
| <p:option name="detailed" as="xs:boolean" select="false()"/>
|</p:declare-step>2.1.1. Pipeline inputs
The p:input element describes a step input.
| <p:input | |
| port = NCName | The port name |
| sequence? = boolean | Accept a (possibly empty) sequence of documents? |
| primary? = boolean | This is the primary port? |
| select? = XPathExpression | XPath selection from the inputs |
| content-types? = ContentTypes | Acceptable content types |
| href? = { anyURI } | A document binding |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| > | |
| ((empty | (document | inline)*) | anyElement*) | |
| </p:input> |
The input in Example 2.3, “A pipeline input” has the port name “copyright” and
accepts only XML documents. If it’s the only input, it will be primary. The input
doesn’t specify sequence="true", so a single document is required.
portThe port name is how you refer to an input or output port. Its name must be unique.
sequenceIf a sequence is allowed, any number of documents can be used on that port. If not, exactly one document must be used.
primaryIf a port is primary, that’s where implicit connections are made. If there’s only one input or output port, it will be primary be default. If there’s more than one, none are primary unless indicated explicitly. At most one input port and one output port may be declared primary.
selectA select expression on an input matches the selected nodes and creates an input document for each. Matching, for example,
//chapterwill make the input a sequence of documents, one for each chapter element that appears on the original input.content-typesIf a list of content types is provided, only documents that are of those types are allowed. Note that XProc allows both positive and negative content types.
hrefShortcut for a single
p:documentbinding.exclude-inline-prefixesYou can use
exclude-inline-prefixesto exclude some namespaces from ap:inline, as noted about the attributes ofp:declare-step. If theexclude-inline-prefixeselement occurs multiple times among the ancestors of an inline, the effect is the union of all such prefixes.
|<p:input port="copyright" content-types="xml"/>2.1.2. Pipeline outputs
The p:output element describes a step output. There
are three slightly different forms, depending on where the output is being
declared. The pipeline and compound step forms are the same except that
on a pipeline, serialization parameters may also be declared.
| <p:output | |
| port? = NCName | The port name |
| sequence? = boolean | Accept a (possibly empty) sequence of documents? |
| primary? = boolean | This is the primary port? |
| content-types? = ContentTypes | Acceptable content types |
| href? = { anyURI } | A document binding |
| pipe? = string | A pipe binding |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| serialization? = map(xs:QName,item()*) | Serialization options |
| > | |
| ((empty | (document | pipe | inline)*) | anyElement*) | |
| </p:output> |
The output in Example 2.4, “A pipeline output” has the port name “result”. The pipeline result must be a single text document.
The notes about the attributes on p:input apply to p:output.
In addition, p:output has a pipe attribute that
is a shortcut for p:pipe bindings. On a p:declare-step it
may also have a map of serialization parameters.
|<p:output port="result" content-types="text"/>On a compound step, no serialization can occur, so it would be pointless to specify them.
| <p:output | |
| port? = NCName | The port name |
| sequence? = boolean | Accept a (possibly empty) sequence of documents? |
| primary? = boolean | This is the primary port? |
| content-types? = ContentTypes | Acceptable content types |
| href? = { anyURI } | A document binding |
| pipe? = string | A pipe binding |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| > | |
| ((empty | (document | pipe | inline)*) | anyElement*) | |
| </p:output> |
On the declaration of an atomic step, you also cannot provide any connections.
| <p:output | |
| port? = NCName | The port name |
| sequence? = boolean | Accept a (possibly empty) sequence of documents? |
| primary? = boolean | This is the primary port? |
| content-types? = ContentTypes | Acceptable content types |
| {any-name}* = string | Additional attributes |
| /> |
The output on the cx:fileset declaration indicates that
the step can produce a sequence of XML documents.
|<p:output port="result" content-types="xml" sequence="true"/>2.1.3. Pipeline options
Options are declared with the p:option element.
| <p:option | |
| name = EQName | The option name |
| as? = XPathSequenceType | The required value type |
| values? = string | Allowed values |
| static? = boolean | Static value? |
| required? = boolean | Option value is required? |
| select? = XPathExpression | Default value |
| {any-name}* = string | Additional attributes |
| visibility? = private|public | Visible outside the library? |
| /> |
Options can be required or have a default value and they can be typed.
nameThe option name. Options cannot shadow earlier options, they must have unique names. Non-static options can be shadowed in the subpipeline by
p:variables.asThe type of the option, for example “
xs:integer” or “map(xs:string, xs:dateTime)”.valuesA list of atomic values. This forms an enumeration and the option must be one of these values.
staticIs the option evaluated at compile time? Static options can be used in
use-whenexpressions for conditional element exclusion.requiredIf an attribute is required, the caller must provide a value for it. If it isn’t required, and no value is provided, the default value is taken from the select attribute. (If there’s no select attribute, the default value is the empty sequence.)
selectProvides a default value for the option. Option default values can refer to preceding options, but not to the step inputs.
The compound step declaration in Example 2.1, “A compound step declaration” declares three options:
|<p:option name="path" required="true" as="xs:string"/>
|<p:option name="output-path" required="true" as="xs:anyURI"/>
|<p:option name="recurse" select="false()" as="xs:boolean"/>The path and output-path options are required and must be
a string and a URI, respectively (there’s no practical reason to make them different types
in this case, it’s just to make the example more interesting).
The recurse option is not required and will default to “false”.
Options can be declared static:
| <:option | |
| name = EQName | The option name |
| as? = XPathSequenceType | The required value type |
| values? = string | Allowed values |
| static = "true" | Static value? |
| select = XPathExpression | Default value |
| {any-name}* = string | Additional attributes |
| visibility? = private|public | Visible outside the library? |
| /> |
Options inside a p:library must be declared static.
It must be possible to evaluate a static option without reference to any pipeline
input documents. It is evaluated “at compile time”. You may not shadow a static option
with another option or p:variable.
2.1.4. Declaring libraries of steps
Several steps (and static options) can be bundled together in a library.
| <p:library | |
| psvi-required? = boolean | Is XML Schema validated input required? |
| xpath-version? = decimal | The XPath version required |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| version? = decimal | The XProc version (3.0 or 3.1) |
| {any-name}* = string | Additional attributes |
| > | |
| (import | import-functions)*, option*, declare-step* | |
| </p:library> |
A collection of step declarations and (static) options can be put inside
a p:library so that they can all be imported together.
The notes about the attributes on p:declare-step apply to
p:library.
2.1.5. Importing libraries and function libraries
A library (or a single step) can be imported.
| <p:import | |
| {any-name}* = string | Additional attributes |
| href = anyURI | Document URI |
| /> |
The document URI must identify a pipeline or library document.
XML Calabash provides URIs for importing its extension steps. For example,
https://xmlcalabash.com/ext/library/fileset.xpl for the
cx:fileset extension step. Generally speaking, the specified URI is
retrieved and parsed for its declarations. The library URIs that XML Calabash
provides for its extension steps are resolved internally, without accessing the
internet.
XML Calabash can also import functions defined in XSLT or XQuery.
| <p:import-functions | |
| {any-name}* = string | Additional attributes |
| href = anyURI | Document URI |
| content-type? = ContentType | The content type |
| namespace? = string | The namespace(s) to import |
| /> |
Importing functions allows them to be used in expressions in the pipeline.
hrefThe document URI must identify a library of functions.
content-typeIf a content-type is provided, it informs the processor what is expected from the imported library.
namespaceA whitespace-separated list of namespace URIs. If provided, only functions declared in one of those namespaces will be imported.
The stylesheet in Example 2.7, “A function library in XSLT” defines
two functions, f:is-leap-day with no arguments and
f:is-leap-day with a single date argument.
1 |<?xml version="1.0" encoding="utf-8"?>
|<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| xmlns:f="http://example.com/ns/functions"
5 | exclude-result-prefixes="xs"
| version="3.0">
|
|<xsl:function name="f:is-leap-day">
| <xsl:sequence select="f:is-leap-day(current-date())"/>
10 |</xsl:function>
|
|<xsl:function name="f:is-leap-day">
| <xsl:param name="date"/>
| <xsl:choose>
15 | <xsl:when test="$date instance of xs:date">
| <xsl:sequence select="month-from-date($date) = 2
| and day-from-date($date) = 29"/>
| </xsl:when>
| <xsl:when test="$date instance of xs:dateTime">
20 | <xsl:sequence select="month-from-dateTime($date) = 2
| and day-from-dateTime($date) = 29"/>
| </xsl:when>
| <xsl:when test="$date castable as xs:date">
| <xsl:variable name="dt" select="$date cast as xs:date"/>
25 | <xsl:sequence select="month-from-date($dt) = 2
| and day-from-date($dt) = 29"/>
| </xsl:when>
| <xsl:when test="$date castable as xs:dateTime">
| <xsl:variable name="dt" select="$date cast as xs:dateTime"/>
30 | <xsl:sequence select="month-from-date($dt) = 2
| and day-from-date($dt) = 29"/>
| </xsl:when>
| <xsl:otherwise>
| <xsl:sequence select="false()"/>
35 | </xsl:otherwise>
| </xsl:choose>
|</xsl:function>
|
|</xsl:stylesheet>After the library has been imported, the functions that it defines can be used in XPath expressions in the pipeline. The pipeline in Example 2.8, “Using the function library” outputs a single document that answers the questions, “is today a leap day and is 29 February 2028 a leap day?”
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:f="http://example.com/ns/functions"
| name="main" version="3.1">
| <p:documentation>
5 | <div xmlns="http://www.w3.org/1999/xhtml">
| <p>Example of importing functions. This requires Saxon EE.</p>
| </div>
| </p:documentation>
|
10 | <p:import-functions href="is-leap-day.xsl"/>
|
| <p:output port="result" serialization="map{'indent':true()}"/>
|
| <p:identity>
15 | <p:with-input exclude-inline-prefixes="#all">
| <leap-days>
| <today date="{substring(string(current-date()), 1, 10)}"
| >{f:is-leap-day()}</today>
| <other date="2028-02-29"
20 | >{f:is-leap-day('2028-02-29')}</other>
| </leap-days>
| </p:with-input>
| </p:identity>
|</p:declare-step>If you have Saxon EE and you run the pipeline, the output will be something like:
<leap-days>
<today date="2025-07-26">false</today>
<other date="2028-02-29">true</other>
</leap-days>Note that the p:output element in the pipeline uses the serialization
options to pretty-print the output and the p:with-input uses
exclude-inline-prefixes to avoid having the namespace
declaration for “f” on the output.
2.2. Connecting steps to inputs
Atomic and compound steps use p:with-input to describe how
their inputs are connected.
| <p:with-input | |
| port? = NCName | The port name |
| select? = XPathExpression | XPath selection from the inputs |
| href? = { anyURI } | Document URI |
| pipe? = string | A pipe binding |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| > | |
| ((empty | (document | pipe | inline)*) | anyElement*) | |
| </p:with-input> |
The notes about the attributes on p:input apply to p:with-input.
The input on the p:for-each step in the example above does
not have any explicit bindings:
|<p:with-input select="//c:file"/>That means it connects to the “default readable port”, usually the primary output of the preceding step.
2.2.1. Document inputs
The p:document element (or the href attribute
on p:with-input) connects an input to document identified with a URI.
| <p:document | |
| href = { anyURI } | Document URI |
| content-type? = string | The required content type |
| document-properties? = map(xs:QName,item()*) | Document properties map |
| parameters? = map(xs:QName,item()*) | Parameters map |
| {any-name}* = string | Additional attributes |
| /> |
The document properties will be applied to the result. The parameters me be used during document access.
hrefThe document will be retrieved from this URI.
content-typeIdentifies the (required) content type of the document, for example
application/jsonortext/plain. If not specified, the content type is inferred from the URI.document-propertiesDocument properties are name/value pairs associated with the document. Unqualified property names, like
base-uriare defined by the XProc specification. You can add arbitrary namespace qualified properties.parametersThe
p:documentinstruction is defined in terms of thep:loadstep. The parameters passed to it can be used by the load step to aid in retrieving the document. For example, a username and password might be passed as parameters.
2.2.2. Inline inputs
The p:inline element connects an input to a document placed
into the pipeline directly.
| <p:inline | |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| content-type? = string | The content type of the inline |
| document-properties? = map(xs:QName,item()*) | Document properties map |
| encoding? = string | Encoded content (base64) |
| > | |
| anyNode* | |
| </p:inline> |
The p:inline element can be omitted in the simple case of a
single XML document.
content-typeIdentifies the (required) content type of the content, for example
application/jsonortext/plain. If not specified, the content type is assumed to be XML.document-propertiesDocument properties are name/value pairs associated with the document. Unqualified property names, like
base-uriare defined by the XProc specification. You can add arbitrary namespace qualified properties.encodingThe only supported value for encoding is
base64. This encoding allows an inline to be in a different character set than the XML or to contain non-XML characters.
This p:inline element specifies that it is text. Attribute
value templates in the body are used to evaluate expressions.
|<p:inline content-type="text/plain">Processed {$total} files; {f:is-leap-day()} </p:inline>2.2.3. Pipe inputs
The p:pipe element (or the pipe attribute
on p:with-input) connects an input to the output from some other step.
If step is omitted, the step associated with the default
readable port is assumed. If port is omitted, the primary output
port of the step is assumed. (Consequently, <p:pipe/> is a connection
to the default readable port.) It’s an error to attempt to refer to the default readable
port if there isn’t one.
| <p:pipe | |
| step? = NCName | The step name |
| port? = NCName | The port name |
| {any-name}* = string | Additional attributes |
| /> |
The p:identity step in the viewport users a pipe
attribute to make a pipe binding:
|<p:with-input pipe="copyright@main"/>The variable declaration for $total uses the p:pipe step to
make a pipe binding. In the context of a variable, this establishes the context item used
when evaluating the expression.
|<p:pipe step="listing"/>2.2.4. Empty inputs
The p:empty element explicitly binds the input to an empty sequence
of documents.
| <p:empty | |
| {any-name}* = string | Additional attributes |
| /> |
2.3. Connecting option values to steps
In much the same way as inputs are provided using p:with-input, option values
are provided using p:with-option.
| <p:with-option | |
| name = EQName | The option name |
| as? = XPathSequenceType | The required value type |
| select = XPathExpression | Option value |
| collection? = boolean | Inputs as default collection? |
| href? = { anyURI } | Document URI |
| pipe? = string | A pipe binding |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| > | |
| ((empty | (document | pipe | inline)*) | anyElement*) | |
| </p:with-option> |
Options can also be specified as attributes
on the step itself, in which case the attribute name is the option name and its
value is interprted as an attribute value template. For example,
in Example 2.1, “A compound step declaration”, the $path and
$include-filter options are set this way:
|<p:directory-list name="listing" path="{$path}"
| include-filter=".*\.xml$">nameThe option name. This must be the name of an option declared for the step on which it is used.
asThe type of the option, for example “
xs:integer” or “map(xs:string, xs:dateTime)”.selectThe select expression is evaluated to provide a value for the option.
collectionIf the collection attribute is true, all of the documents that appear on the context binding are placed in the default collection for the expression. An expression can only refer to the context item if it is a single value, but by using the default collection, an option can handle a sequence of values.
hrefShortcut for a single
p:documentbinding.pipeShortcut for one or more
p:pipebindings.exclude-inline-prefixesWhen you put XML in a
p:inlineelement, all of the in-scope namespaces will apply to those elements. You can useexclude-inline-prefixesto exclude some of them. The value of the attribute must be a space-separated list of in-scope namespace prefixes. It’s an error to refer to a prefix that doesn’t have an in-scope declaration. The tokens#defaultand#allmay also be used to exclude the default namespace and all namespaces, respectively.
In Example 2.1, “A compound step declaration”,
the $max-depth option is set on the p:directory-list
using p:with-option:
|<p:with-option name="max-depth"
| select="if ($recurse) then 'unbounded' else '1'"/>2.4. Compound steps
The XProc specification defines several compound steps, steps that contain subpipelines. XML Calabash also implements a couple of additional compound steps. (Pipelines that use extension compound steps are probably not, strictly speaking, conformant with the specification.)
2.4.1. Looping over inputs (p:for-each)
The p:for-each step loops over sequence of documents, processing each
with its subpipeline.
| <p:for-each | |
| name? = NCName | The step name |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, output*, subpipeline) | |
| </p:for-each> |
Unlike XPath, if a select expression is used
on the p:with-input the nodes selected from the original document(s)
are not what the loop iterates over. Instead, a whole new document
is constructed for each selection and that document is processed.
This means, for example, that “@name” can’t be used to test
an attribute on the loop input. You need to use “/*/@name” or something
similar instead.
The p:for-each in Example 2.9, “Looping with for-each” loops over the
c:file elements from the p:directory-list step, but each
one will be in its own document.
|<p:for-each name="loop">
| <p:with-input select="//c:file"/>
|…
|</p:for-each>In principle, a p:for-each can process its inputs in parallel.
XML Calabash does not do so at this time.
2.4.2. Changing internal structures (p:viewport)
The p:viewport step replaces sections of a document with the result
of processing (a document containing) those sections using the subpipeline.
| <p:viewport | |
| name? = NCName | The step name |
| match = XSLTSelectionPattern | Match pattern for content to replace |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, output?, subpipeline) | |
| </p:viewport> |
The p:viewport in Example 2.10, “Matching with viewport” matches
each copyright element where the holder is “Someone Random” and replaces
that element with the result of its subpipeline, in this case, with the contents
of the document on the copyright input port.
1 |<p:viewport match="copyright[. = 'Someone Random']">
| <p:identity>
| <p:with-input pipe="copyright@main"/>
| </p:identity>
5 |</p:viewport>Like select on p:for-each, the elements
matched by p:viewport are available to the subpipeline as
new documents.
2.4.3. Choosing among alternatives (p:choose)
The p:choose step allows you to select processing among a number
of alternatives. At most one alternative will be used: either the first p:when
(in document order) for which the test expression is true or the p:otherwise.
| <p:choose | |
| name? = NCName | The step name |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, ((when+, otherwise?) | (when*, otherwise))) | |
| </p:choose> |
The example pipeline in Example 2.11, “An example choice” uses
p:choose to select which stylesheet to use when formatting a document
based on the document status.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | xmlns:ex="http://example.com/ns"
| name="main" version="3.1" type="ex:format">
| <p:input port="source"/>
| <p:output port="result"/>
|
10 | <p:choose>
| <p:when test="/*/@status = 'draft'">
| <p:xslt>
| <p:with-input port="stylesheet" href="draft.xsl"/>
| </p:xslt>
15 | </p:when>
| <p:when test="/*/@status = 'final'">
| <p:with-input pipe="source"/>
| <p:xslt>
| <p:with-input port="stylesheet" href="final.xsl"/>
20 | </p:xslt>
| </p:when>
| <p:otherwise>
| <p:error code="ex:bad-status">
| <p:with-input>
25 | <p:inline content-type="text/plain">Unexpected status</p:inline>
| </p:with-input>
| </p:error>
| </p:otherwise>
| </p:choose>
30 |</p:declare-step>Each alternative is a p:when which contains the subpipeline to run
if this alternative is selected.
| <p:when | |
| name? = NCName | The step name |
| test = XPathExpression | The test expression |
| collection? = boolean | Inputs as default collection? |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, output*, subpipeline) | |
| </p:when> |
The first (and only the first) p:when where the test expression evaluates
to true is used.
nameThe step name is only used by the subpipeline in the
p:when. It’s how a step in the subpipeline can refer, for example withp:pipe, to the context input.testThe test expression. This
p:whenis selected if it is the firstp:when, in document order, where the expression evaluates to true. It’s an error if the expression refers to the context item if there is not exactly one context item provided by thep:with-input.collectionIf the collection attribute is true, all of the documents that appear on the context binding are placed in the default collection for the expression. An expression can only refer to the context item if it is a single value, but by using the default collection, an option can handle a sequence of values.
The first p:when selects documents with a status of “draft”:
1 |<p:when test="/*/@status = 'draft'">
| <p:xslt>
| <p:with-input port="stylesheet" href="draft.xsl"/>
| </p:xslt>
5 |</p:when>The second p:when selects documents with a status of “final”.
This example includes an explicit binding for the input that will be used to set
the context item for the test expression. It’s unnecessary as it’s the same as the default
readable port in this case.
1 |<p:when test="/*/@status = 'final'">
| <p:with-input pipe="source"/>
| <p:xslt>
| <p:with-input port="stylesheet" href="final.xsl"/>
5 | </p:xslt>
|</p:when>The p:otherwise contains the subpipeline to run
if no other alternative is selected.
| <p:otherwise | |
| name? = NCName | The step name |
| {any-name}* = string | Additional attributes |
| > | |
| (output*, subpipeline) | |
| </p:otherwise> |
The p:otherwise in this example raises an error if the status is neither
draft nor final:
1 |<p:otherwise>
| <p:error code="ex:bad-status">
| <p:with-input>
| <p:inline content-type="text/plain">Unexpected status</p:inline>
5 | </p:with-input>
| </p:error>
|</p:otherwise>2.4.4. Simple conditionals (p:if)
The p:if is a simplified form of p:choose. If the
test expression is true, then the subpipeline is run and that determines the output
from the step. If the expression is false, p:if operates as an identity step,
passing its input through unchanged.
| <p:if | |
| name? = NCName | The step name |
| test = XPathExpression | The test expression |
| collection? = boolean | Inputs as default collection? |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, output*, subpipeline) | |
| </p:if> |
The notes about the attributes on p:when apply to p:if.
The p:if in Example 2.12, “Using an if instruction” deletes the
count attribute if it has the value “0”.
|<p:if test="xs:integer(/doc/@count) = 0">
| <p:delete match="/doc/@count"/>
|</p:if>If the attribute has any other value (or isn’t present), the document
passes through as if it was an p:identity step.
2.4.5. Grouping (p:group)
The p:group step is just a wrapper around a subpipeline.
| <p:group | |
| name? = NCName | The step name |
| {any-name}* = string | Additional attributes |
| > | |
| (output*, subpipeline) | |
| </p:group> |
2.4.6. Exception handling (p:try)
The p:try step allows a pipeline author to catch errors
and recover from them. The subpipeline is run. If no errors occur, the result
of the step is the result of that subpipeline. If an error does occur, each
p:catch is tested in turn and the result of the step is the result
of the first matching p:catch.
| <p:try | |
| name? = NCName | The step name |
| {any-name}* = string | Additional attributes |
| > | |
| (output*, subpipeline, ((catch+, finally?) | (catch*, finally))) | |
| </p:try> |
The example for p:choose raises an error if the status is
neither “draft” nor “final”. We can use p:try to recover from that error
and treat any such document as if it had draft status, as shown in
Example 2.13, “An example try/catch”.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | xmlns:ex="http://example.com/ns"
| name="main" version="3.1">
| <p:import href="choose.xpl"/>
|
| <p:input port="source"/>
10 | <p:output port="result"/>
|
| <p:try>
| <ex:format/>
| <p:catch code="ex:bad-status">
15 | <p:xslt>
| <p:with-input port="source" pipe="source@main"/>
| <p:with-input port="stylesheet" href="draft.xsl"/>
| </p:xslt>
| </p:catch>
20 | </p:try>
|</p:declare-step>A p:catch matches an error if the error code is in its
code list, or if it does not have a
code attribute at all.
If there are no matching catches, the p:try step fails with the
error (which may be caught and handled by some p:try among its
ancestors, if it has any.)
| <p:catch | |
| name? = NCName | The step name |
| code? = EQNameList | The error codes to catch |
| {any-name}* = string | Additional attributes |
| > | |
| (output*, subpipeline) | |
| </p:catch> |
Note that the default readable port inside the p:catch is the
error document produced by the failed pipeline. You have to make an explicit binding
if you want something else.
Irrespective of whether the subpipeline succeeds or fails and whether or not
a catch is invoked (and whether or not it succeeds or fails), the p:finally
subpipeline will be run.
| <p:finally | |
| name? = NCName | The step name |
| {any-name}* = string | Additional attributes |
| > | |
| (output*, subpipeline) | |
| </p:finally> |
It is very uncommon for this to be useful. One plausible use case is for the finally step to clean up any side effects that might have been introduced by the subpipeline or the catch expressions, for example, deleting a temporary file or closing a database connection.
2.4.7. Loop until a condition is true (cx:until)
This is an extension
compound step that processes single documents,
applying its subpipeline until the
test expression is true.
| <cx:until | |
| name? = NCName | The step name |
| test = XPathExpression | The test expression |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, output?, subpipeline) | |
| </cx:until> |
The test attribute specifies an XPath
expression. The subpipeline is always run at least once and the condition is
only tested at the end of the loop.
The result of the subpipeline is provided as the context item.
The previous result is provided in the variable
cx:previous.
The pipeline Example 2.14, “Looping until a condition is true” demonstrates a horribly inefficient way to add explicit numbers to a list.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | exclude-inline-prefixes="#all"
| name="main" version="3.1">
| <p:output port="result" serialization="map{'indent':true()}"/>
|
| <p:identity name="identity">
10 | <p:with-input>
| <list>
| <item/>
| <item/>
| <item/>
15 | </list>
| </p:with-input>
| </p:identity>
|
| <cx:until test="deep-equal(., $cx:previous)">
20 | <p:replace match="/list/item[1]">
| <p:with-input port="replacement">
| <li number="{p:iteration-position()}"/>
| </p:with-input>
| </p:replace>
25 | </cx:until>
|
|</p:declare-step>The first time through the loop, the first item is replaced. The next time through, the next item is replaced, etc. Looping stops when nothing changes in the document (that is, when all the items have been replaced).
The result of the cx:until step is first document for which the
test expression is true. In this case, the result is:
1 |<list>
| <li number="1"/>
| <li number="2"/>
| <li number="3"/>
5 |</list>It is a dynamic error (err:XD0001) if the source is not a
single document.
2.4.8. Loop while a condition is true (cx:while)
This is an extension
compound step that processes single documents,
applying its subpipeline while the
test expression is true.
| <cx:while | |
| name? = NCName | The step name |
| test = XPathExpression | The test expression |
| {any-name}* = string | Additional attributes |
| > | |
| (with-input?, output?, subpipeline) | |
| </cx:while> |
The somewhat contrived example in Example 2.15, “Looping while a condition is true” loops over the document adding a new first child until the count reaches zero.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:c="http://www.w3.org/ns/xproc-step"
| xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
5 | exclude-inline-prefixes="#all"
| name="main" version="3.1">
| <p:output port="result" serialization="map{'indent':true()}"/>
|
| <p:identity name="identity">
10 | <p:with-input>
| <doc count="3"/>
| </p:with-input>
| </p:identity>
|
15 | <cx:while test="/doc/@count and xs:integer(/doc/@count) gt 0">
| <p:insert position="first-child">
| <p:with-input port="insertion">
| <insertion for="{/doc/@count}"/>
| </p:with-input>
20 | </p:insert>
|
| <p:add-attribute attribute-name="count" attribute-value="{xs:integer(/doc/@count) - 1}"/>
|
| <p:if test="xs:integer(/doc/@count) = 0">
25 | <p:delete match="/doc/@count"/>
| </p:if>
| </cx:while>
|
|</p:declare-step>The result of the cx:while step is first document for which
the test expression did not have an effective boolean value of true. In this
case, the result is:
1 |<doc>
| <insertion for="1"/>
| <insertion for="2"/>
| <insertion for="3"/>
5 |</doc>It is a
dynamic error (err:XD0001) if the source is not a single
document.
The test attribute specifies an
XPath expression. The document is provided as the context item. If the
expression is false, the loop is not run (or run again).
2.5. Atomic steps
A great many pipelines that you write will be like shell scripts or “main” functions in other programming languages: they run, they do a thing, and they end. But in fact, with a little extra markup, every pipeline that you declare can also be reused an atomic step elsewhere. In this way, there are an unbounded number of atomic steps: there are all of the standard ones (summarized in Part I, “Standard steps”), there are all of the extension steps that ship with XML Calabash (summarized in Part III, “XML Calabash extension steps”), and then there are all the steps that you write.
2.6. Variables
A subpipeline may use p:variable to hold the result of a
computation.
| <p:variable | |
| name = EQName | The variable name |
| as? = XPathSequenceType | The required value type |
| select = XPathExpression | Variable value |
| collection? = boolean | Inputs as default collection? |
| href? = { anyURI } | Document URI |
| pipe? = string | A pipe binding |
| {any-name}* = string | Additional attributes |
| exclude-inline-prefixes? = string | A space-separated list of namespace prefixes |
| > | |
| ((empty | (document | pipe | inline)*) | anyElement*) | |
| </p:variable> |
The notes about the attributes on p:with-option apply to p:variable.
Expressions which occur later in the subpipeline may refer to the variable.
2.7. Extra information
Documentation can be placed anywhere in the pipeline with the p:documentation
element. It’s ignored by the processor.
| <p:documentation | |
| {any-name}* = string | Additional attributes |
| > | |
| any-well-formed-content* | |
| </p:documentation> |
The p:pipeinfo element is intended for additional information that
a particular processor might use. XML Calabash uses them for
assertions,
for example.
| <p:pipeinfo | |
| {any-name}* = string | Additional attributes |
| > | |
| any-well-formed-content* | |
| </p:pipeinfo> |
The p:documentation and p:pipeinfo elements have no special
significance inside a p:inline; in that context, they’re just inline
elements.