Chapter 2Language reference

XProc is a data flow programming language described with an XML vocabulary. This chapter provides an overview of the features of the language. At a high level, XProc allows you to combine steps, units of computation, in a variety of ways to achieve your goal. The p:for-each step, for example, will iterate over a set of documents and the p:xslt step will perform XSLT transformations.

Broadly speaking, the features are:

  • Structures for declaring pipelines,

  • structures for connecting steps to inputs,

  • compound steps,

  • atomic steps,

  • options,

  • variables,

  • and extra information

The following sections give a brief overview of the elements in the XProc vocabulary.

Note from the author

This section is something of a work-in-progress. At the moment, it’s neither a comprehensive description of every aspect of the vocabulary, nor is it tutorial in nature. But it’s useful to have every element in the vocabulary present in the reference. Suggestions for improvements are welcome.

In the summaries that follow, {any-name}* generally means any number of additional namespace qualified names. These are roughly extension attributes and are ignored unless the processor uses them for some implementation-defined purpose.

2.1Declaring pipelines

The most common pipeline declaration specifies the inputs, outputs, and options that the step accepts, followed by the steps that implement the pipeline. Pipelines may also import libraries and functions, and may declare steps.

<p:declare-step
  name? = NCNameThe step name
  type? = EQNameThe step type (for reuse)
  psvi-required? = booleanIs XML Schema validated input required?
  xpath-version? = decimalThe XPath version required
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
  version? = decimalThe XProc version (3.0 or 3.1)
  visibility? = private|publicVisible outside the library?
  {any-name}* = stringAdditional attributes
>
  (import |
   import-functions)*,
  (input |
   output |
   option)*,
  declare-step*,
  subpipeline?
</p:declare-step>

The example pipeline in Example 2.1, “A compound step declaration” shows a typical compound step declaration. In brief: it iterates over the files in a directory replacing selected copyright elements. We’ll look at several of its features in more detail below.

Notes on the attributes of p:declare-step
name

The step name is only used by the subpipeline in the declaration. It’s how a step in the subpipeline can refer, for example with p:pipe, to one of the step’s inputs.

 1 |<p:declare-step name="main">
   |  <p:input port="source"/>
   |  
   |  <p:identity>
 5 |    <p:with-input>
   |      <p:pipe step="main" port="source"/>
   |    </p:with-input>
   |  </p:identity>
   |  
10 |</p:declare-step>
type

The step type is how you reuse a step. If you declare a step with the type ex:my-step, then you can subsequently use it as an atomic step: <ex:my-step> in other steps, even recursively.

 1 |<p:declare-step name="main" xmlns:ex="http://example.com/ns">
   |  <p:input port="source"/>
   | 
   |  <p:declare-step type="ex:my-step">
 5 |    <p:input port="source"/>
   |    
   |  </p:declare-step>
   |  
   |  <ex:my-step>
10 |    <p:with-input pipe="source@main"/>
   |  </ex:my-step>
   |  
   |</p:declare-step>
psvi-required

If this is true, you’re telling the processor that XML Schema validated inputs are required. This will require Saxon EE.

xpath-version

This specifies the XPath version. The only version that you can use today is “3.1”, but in the future, it might be possible to specify other versions.

exclude-inline-prefixes

When you put XML in a p:inline element, all of the in-scope namespaces will apply to those elements. You can use exclude-inline-prefixes to exclude some of them. The value of the attribute must be a space-separated list of in-scope namespace prefixes. It’s an error to refer to a prefix that doesn’t have an in-scope declaration. The tokens #default and #all may also be used to exclude the default namespace and all namespaces, respectively.

version

The XProc version of the step. Only 3.0 or 3.1 are accepted and they’re equivalent.

visibility

The visibility of a step only makes sense when it occurs in a p:library. Inside a p:library a “private” step is not visible to pipelines that import the library. It can only be used by other steps declared in the library.

Note

The example pipelines and some input documents to demonstrate how they work are available from an examples directory in the repository.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:c="http://www.w3.org/ns/xproc-step"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 5 |                name="main" version="3.1">
   |  <p:documentation>
   |    <div xmlns="http://www.w3.org/1999/xhtml">
   |      <p>This pipeline reads all of the files in a directory and
   |      updates the copyright element.</p>
10 |    </div>
   |  </p:documentation>
   | 
   |  <p:input port="copyright" content-types="xml"/>
   |  <p:output port="result" content-types="text"/>
15 |  <p:option name="path" required="true" as="xs:string"/>
   |  <p:option name="output-path" required="true" as="xs:anyURI"/>
   |  <p:option name="recurse" select="false()" as="xs:boolean"/>
   | 
   |  <p:directory-list name="listing" path="{$path}"
20 |                    include-filter=".*\.xml$">
   |    <p:with-option name="max-depth"
   |                   select="if ($recurse) then 'unbounded' else '1'"/>
   |  </p:directory-list>
   | 
25 |  <p:for-each name="loop">
   |    <p:with-input select="//c:file"/>
   |    <p:variable name="filename" select="/*/@name"/>
   | 
   |    <p:load href="{resolve-uri(/*/@name, base-uri(/*))}"/>
30 | 
   |    <p:viewport match="copyright[. = 'Someone Random']">
   |      <p:identity>
   |        <p:with-input pipe="copyright@main"/>
   |      </p:identity>
35 |    </p:viewport>
   | 
   |    <p:store href="{resolve-uri($filename, resolve-uri($output-path, static-base-uri()))}"/>
   |  </p:for-each>
   | 
40 |  <p:variable name="total" select="count(//c:file)">
   |    <p:pipe step="listing"/>
   |  </p:variable>
   |    
   |  <p:identity>
45 |    <p:with-input xmlns:f="http://example.com/ns/functions">
   |      <p:inline content-type="text/plain">Processed {$total} files; {f:is-leap-day()}&#10;</p:inline>
   |    </p:with-input>
   |  </p:identity>
   |</p:declare-step>
Example 2.1A compound step declaration

A simpler form of declaration specifies the inputs, outputs, and options that the step accepts, but relies on the implementation having been provided through some other means. The XML Calabash extension steps, or extension steps that you write in a JVM language yourself, follow this pattern.

<p:declare-step
  name? = NCNameThe step name
  type? = EQNameThe step type (for reuse)
  psvi-required? = booleanIs XML Schema validated input required?
  xpath-version? = decimalThe XPath version required
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
  version? = decimalThe XProc version (3.0 or 3.1)
  visibility? = private|publicVisible outside the library?
  {any-name}* = stringAdditional attributes
>
  (input |
   output |
   option)*
</p:declare-step>

An example atomic declaration is shown in Example 2.2, “An atomic step declaration”. This is the declaration for the extension step cx:fileset.

 1 |<p:declare-step type="cx:fileset" version="3.1"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema">
 5 |  <p:input port="source" content-types="xml" sequence="true">
   |    <p:empty/>
   |  </p:input>
   |  <p:output port="result" content-types="xml" sequence="true"/>
   |  <p:option name="path" as="xs:string" required="true"/>
10 |  <p:option name="default-excludes" as="xs:boolean" select="true()"/>
   |  <p:option name="case-sensitive" as="xs:boolean" select="true()"/>
   |  <p:option name="error-on-missing-dir" as="xs:boolean" select="true()"/>
   |  <p:option name="follow-symlinks" as="xs:boolean" select="true()"/>
   |  <p:option name="includes" as="xs:string?"/>
15 |  <p:option name="excludes" as="xs:string?"/>
   |  <p:option name="detailed" as="xs:boolean" select="false()"/>
   |</p:declare-step>
Example 2.2An atomic step declaration

2.1.1Pipeline inputs

The p:input element describes a step input.

<p:input
  port = NCNameThe port name
  sequence? = booleanAccept a (possibly empty) sequence of documents?
  primary? = booleanThis is the primary port?
  select? = XPathExpressionXPath selection from the inputs
  content-types? = ContentTypesAcceptable content types
  href? = { anyURI }A document binding
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
>
  ((empty |
    (document |
     inline)*) |
   anyElement*)
</p:input>

The input in Example 2.3, “A pipeline input” has the port name “copyright” and accepts only XML documents. If it’s the only input, it will be primary. The input doesn’t specify sequence="true", so a single document is required.

Notes on the attributes of p:input (and p:output)
port

The port name is how you refer to an input or output port. Its name must be unique.

sequence

If a sequence is allowed, any number of documents can be used on that port. If not, exactly one document must be used.

primary

If a port is primary, that’s where implicit connections are made. If there’s only one input or output port, it will be primary be default. If there’s more than one, none are primary unless indicated explicitly. At most one input port and one output port may be declared primary.

select

A select expression on an input matches the selected nodes and creates an input document for each. Matching, for example, //chapter will make the input a sequence of documents, one for each chapter element that appears on the original input.

content-types

If a list of content types is provided, only documents that are of those types are allowed. Note that XProc allows both positive and negative content types.

href

Shortcut for a single p:document binding.

exclude-inline-prefixes

You can use exclude-inline-prefixes to exclude some namespaces from a p:inline, as noted about the attributes of p:declare-step. If the exclude-inline-prefixes element occurs multiple times among the ancestors of an inline, the effect is the union of all such prefixes.

  |<p:input port="copyright" content-types="xml"/>
Example 2.3A pipeline input

2.1.2Pipeline outputs

The p:output element describes a step output. There are three slightly different forms, depending on where the output is being declared. The pipeline and compound step forms are the same except that on a pipeline, serialization parameters may also be declared.

<p:output
  port? = NCNameThe port name
  sequence? = booleanAccept a (possibly empty) sequence of documents?
  primary? = booleanThis is the primary port?
  content-types? = ContentTypesAcceptable content types
  href? = { anyURI }A document binding
  pipe? = stringA pipe binding
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
  serialization? = map(xs:QName,item()*)Serialization options
>
  ((empty |
    (document |
     pipe |
     inline)*) |
   anyElement*)
</p:output>

The output in Example 2.4, “A pipeline output” has the port name “result”. The pipeline result must be a single text document.

The notes about the attributes on p:input apply to p:output. In addition, p:output has a pipe attribute that is a shortcut for p:pipe bindings. On a p:declare-step it may also have a map of serialization parameters.

  |<p:output port="result" content-types="text"/>
Example 2.4A pipeline output

On a compound step, no serialization can occur, so it would be pointless to specify them.

<p:output
  port? = NCNameThe port name
  sequence? = booleanAccept a (possibly empty) sequence of documents?
  primary? = booleanThis is the primary port?
  content-types? = ContentTypesAcceptable content types
  href? = { anyURI }A document binding
  pipe? = stringA pipe binding
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
>
  ((empty |
    (document |
     pipe |
     inline)*) |
   anyElement*)
</p:output>

On the declaration of an atomic step, you also cannot provide any connections.

<p:output
  port? = NCNameThe port name
  sequence? = booleanAccept a (possibly empty) sequence of documents?
  primary? = booleanThis is the primary port?
  content-types? = ContentTypesAcceptable content types
  {any-name}* = stringAdditional attributes
 />

The output on the cx:fileset declaration indicates that the step can produce a sequence of XML documents.

  |<p:output port="result" content-types="xml" sequence="true"/>
Example 2.5An atomic step output

2.1.3Pipeline options

Options are declared with the p:option element.

<p:option
  name = EQNameThe option name
  as? = XPathSequenceTypeThe required value type
  values? = stringAllowed values
  static? = booleanStatic value?
  required? = booleanOption value is required?
  select? = XPathExpressionDefault value
  {any-name}* = stringAdditional attributes
  visibility? = private|publicVisible outside the library?
 />

Options can be required or have a default value and they can be typed.

Notes on the attributes of p:option
name

The option name. Options cannot shadow earlier options, they must have unique names. Non-static options can be shadowed in the subpipeline by p:variables.

as

The type of the option, for example “xs:integer” or “map(xs:string, xs:dateTime)”.

values

A list of atomic values. This forms an enumeration and the option must be one of these values.

static

Is the option evaluated at compile time? Static options can be used in use-when expressions for conditional element exclusion.

required

If an attribute is required, the caller must provide a value for it. If it isn’t required, and no value is provided, the default value is taken from the select attribute. (If there’s no select attribute, the default value is the empty sequence.)

select

Provides a default value for the option. Option default values can refer to preceding options, but not to the step inputs.

visibility

The visibility of an option only makes sense when it occurs in a p:library. Inside a p:library a “private” option is not visible to pipelines that import the library. It can only be used by other options and steps declared in the library.

The compound step declaration in Example 2.1, “A compound step declaration” declares three options:

  |<p:option name="path" required="true" as="xs:string"/>
  |<p:option name="output-path" required="true" as="xs:anyURI"/>
  |<p:option name="recurse" select="false()" as="xs:boolean"/>
Example 2.6Pipeline options

The path and output-path options are required and must be a string and a URI, respectively (there’s no practical reason to make them different types in this case, it’s just to make the example more interesting). The recurse option is not required and will default to “false”.

Options can be declared static:

<:option
  name = EQNameThe option name
  as? = XPathSequenceTypeThe required value type
  values? = stringAllowed values
  static = "true"Static value?
  select = XPathExpressionDefault value
  {any-name}* = stringAdditional attributes
  visibility? = private|publicVisible outside the library?
 />

Options inside a p:library must be declared static.

It must be possible to evaluate a static option without reference to any pipeline input documents. It is evaluated “at compile time”. You may not shadow a static option with another option or p:variable.

2.1.4Declaring libraries of steps

Several steps (and static options) can be bundled together in a library.

<p:library
  psvi-required? = booleanIs XML Schema validated input required?
  xpath-version? = decimalThe XPath version required
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
  version? = decimalThe XProc version (3.0 or 3.1)
  {any-name}* = stringAdditional attributes
>
  (import |
   import-functions)*,
  option*,
  declare-step*
</p:library>

A collection of step declarations and (static) options can be put inside a p:library so that they can all be imported together.

The notes about the attributes on p:declare-step apply to p:library.

2.1.5Importing libraries and function libraries

A library (or a single step) can be imported.

<p:import
  {any-name}* = stringAdditional attributes
  href = anyURIDocument URI
 />

The document URI must identify a pipeline or library document.

XML Calabash provides URIs for importing its extension steps. For example, https://xmlcalabash.com/ext/library/fileset.xpl for the cx:fileset extension step. Generally speaking, the specified URI is retrieved and parsed for its declarations. The library URIs that XML Calabash provides for its extension steps are resolved internally, without accessing the internet.

XML Calabash can also import functions defined in XSLT or XQuery.

<p:import-functions
  {any-name}* = stringAdditional attributes
  href = anyURIDocument URI
  content-type? = ContentTypeThe content type
  namespace? = stringThe namespace(s) to import
 />

Importing functions allows them to be used in expressions in the pipeline.

Notes on the attributes of p:import-functions
href

The document URI must identify a library of functions.

content-type

If a content-type is provided, it informs the processor what is expected from the imported library.

namespace

A whitespace-separated list of namespace URIs. If provided, only functions declared in one of those namespaces will be imported.

The stylesheet in Example 2.7, “A function library in XSLT” defines two functions, f:is-leap-day with no arguments and f:is-leap-day with a single date argument.

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |                xmlns:f="http://example.com/ns/functions"
 5 |                exclude-result-prefixes="xs"
   |                version="3.0">
   | 
   |<xsl:function name="f:is-leap-day">
   |  <xsl:sequence select="f:is-leap-day(current-date())"/>
10 |</xsl:function>
   | 
   |<xsl:function name="f:is-leap-day">
   |  <xsl:param name="date"/>
   |  <xsl:choose>
15 |    <xsl:when test="$date instance of xs:date">
   |      <xsl:sequence select="month-from-date($date) = 2
   |                            and day-from-date($date) = 29"/>
   |    </xsl:when>
   |    <xsl:when test="$date instance of xs:dateTime">
20 |      <xsl:sequence select="month-from-dateTime($date) = 2
   |                            and day-from-dateTime($date) = 29"/>
   |    </xsl:when>
   |    <xsl:when test="$date castable as xs:date">
   |      <xsl:variable name="dt" select="$date cast as xs:date"/>
25 |      <xsl:sequence select="month-from-date($dt) = 2
   |                            and day-from-date($dt) = 29"/>
   |    </xsl:when>
   |    <xsl:when test="$date castable as xs:dateTime">
   |      <xsl:variable name="dt" select="$date cast as xs:dateTime"/>
30 |      <xsl:sequence select="month-from-date($dt) = 2
   |                            and day-from-date($dt) = 29"/>
   |    </xsl:when>
   |    <xsl:otherwise>
   |      <xsl:sequence select="false()"/>
35 |    </xsl:otherwise>
   |  </xsl:choose>
   |</xsl:function>
   | 
   |</xsl:stylesheet>
Example 2.7A function library in XSLT

After the library has been imported, the functions that it defines can be used in XPath expressions in the pipeline. The pipeline in Example 2.8, “Using the function library” outputs a single document that answers the questions, “is today a leap day and is 29 February 2028 a leap day?”

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:f="http://example.com/ns/functions"
   |                name="main" version="3.1">
   |  <p:documentation>
 5 |    <div xmlns="http://www.w3.org/1999/xhtml">
   |      <p>Example of importing functions. This requires Saxon EE.</p>
   |    </div>
   |  </p:documentation>
   | 
10 |  <p:import-functions href="is-leap-day.xsl"/>
   | 
   |  <p:output port="result" serialization="map{'indent':true()}"/>
   |    
   |  <p:identity>
15 |    <p:with-input exclude-inline-prefixes="#all">
   |      <leap-days>
   |        <today date="{substring(string(current-date()), 1, 10)}"
   |               >{f:is-leap-day()}</today>
   |        <other date="2028-02-29"
20 |               >{f:is-leap-day('2028-02-29')}</other>
   |      </leap-days>
   |    </p:with-input>
   |  </p:identity>
   |</p:declare-step>
Example 2.8Using the function library

If you have Saxon EE and you run the pipeline, the output will be something like:

<leap-days>
   <today date="2025-07-26">false</today>
   <other date="2028-02-29">true</other>
</leap-days>

Note that the p:output element in the pipeline uses the serialization options to pretty-print the output and the p:with-input uses exclude-inline-prefixes to avoid having the namespace declaration for “f” on the output.

2.2Connecting steps to inputs

Atomic and compound steps use p:with-input to describe how their inputs are connected.

<p:with-input
  port? = NCNameThe port name
  select? = XPathExpressionXPath selection from the inputs
  href? = { anyURI }Document URI
  pipe? = stringA pipe binding
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
>
  ((empty |
    (document |
     pipe |
     inline)*) |
   anyElement*)
</p:with-input>

The notes about the attributes on p:input apply to p:with-input.

The input on the p:for-each step in the example above does not have any explicit bindings:

  |<p:with-input select="//c:file"/>

That means it connects to the “default readable port”, usually the primary output of the preceding step.

2.2.1Document inputs

The p:document element (or the href attribute on p:with-input) connects an input to document identified with a URI.

<p:document
  href = { anyURI }Document URI
  content-type? = stringThe required content type
  document-properties? = map(xs:QName,item()*)Document properties map
  parameters? = map(xs:QName,item()*)Parameters map
  {any-name}* = stringAdditional attributes
 />

The document properties will be applied to the result. The parameters me be used during document access.

Notes on the attributes of p:document
href

The document will be retrieved from this URI.

content-type

Identifies the (required) content type of the document, for example application/json or text/plain. If not specified, the content type is inferred from the URI.

document-properties

Document properties are name/value pairs associated with the document. Unqualified property names, like base-uri are defined by the XProc specification. You can add arbitrary namespace qualified properties.

parameters

The p:document instruction is defined in terms of the p:load step. The parameters passed to it can be used by the load step to aid in retrieving the document. For example, a username and password might be passed as parameters.

2.2.2Inline inputs

The p:inline element connects an input to a document placed into the pipeline directly.

<p:inline
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
  content-type? = stringThe content type of the inline
  document-properties? = map(xs:QName,item()*)Document properties map
  encoding? = stringEncoded content (base64)
>
  anyNode*
</p:inline>

The p:inline element can be omitted in the simple case of a single XML document.

Notes on the attributes of p:document
content-type

Identifies the (required) content type of the content, for example application/json or text/plain. If not specified, the content type is assumed to be XML.

document-properties

Document properties are name/value pairs associated with the document. Unqualified property names, like base-uri are defined by the XProc specification. You can add arbitrary namespace qualified properties.

encoding

The only supported value for encoding is base64. This encoding allows an inline to be in a different character set than the XML or to contain non-XML characters.

This p:inline element specifies that it is text. Attribute value templates in the body are used to evaluate expressions.

  |<p:inline content-type="text/plain">Processed {$total} files; {f:is-leap-day()}&#10;</p:inline>

2.2.3Pipe inputs

The p:pipe element (or the pipe attribute on p:with-input) connects an input to the output from some other step. If step is omitted, the step associated with the default readable port is assumed. If port is omitted, the primary output port of the step is assumed. (Consequently, <p:pipe/> is a connection to the default readable port.) It’s an error to attempt to refer to the default readable port if there isn’t one.

<p:pipe
  step? = NCNameThe step name
  port? = NCNameThe port name
  {any-name}* = stringAdditional attributes
 />

The p:identity step in the viewport users a pipe attribute to make a pipe binding:

  |<p:with-input pipe="copyright@main"/>

The variable declaration for $total uses the p:pipe step to make a pipe binding. In the context of a variable, this establishes the context item used when evaluating the expression.

  |<p:pipe step="listing"/>

2.2.4Empty inputs

The p:empty element explicitly binds the input to an empty sequence of documents.

<p:empty
  {any-name}* = stringAdditional attributes
 />

2.3Connecting option values to steps

In much the same way as inputs are provided using p:with-input, option values are provided using p:with-option.

<p:with-option
  name = EQNameThe option name
  as? = XPathSequenceTypeThe required value type
  select = XPathExpressionOption value
  collection? = booleanInputs as default collection?
  href? = { anyURI }Document URI
  pipe? = stringA pipe binding
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
>
  ((empty |
    (document |
     pipe |
     inline)*) |
   anyElement*)
</p:with-option>

Options can also be specified as attributes on the step itself, in which case the attribute name is the option name and its value is interprted as an attribute value template. For example, in Example 2.1, “A compound step declaration”, the $path and $include-filter options are set this way:

  |<p:directory-list name="listing" path="{$path}"
  |                  include-filter=".*\.xml$">
Notes on the attributes of p:with-option
name

The option name. This must be the name of an option declared for the step on which it is used.

as

The type of the option, for example “xs:integer” or “map(xs:string, xs:dateTime)”.

select

The select expression is evaluated to provide a value for the option.

collection

If the collection attribute is true, all of the documents that appear on the context binding are placed in the default collection for the expression. An expression can only refer to the context item if it is a single value, but by using the default collection, an option can handle a sequence of values.

href

Shortcut for a single p:document binding.

pipe

Shortcut for one or more p:pipe bindings.

exclude-inline-prefixes

When you put XML in a p:inline element, all of the in-scope namespaces will apply to those elements. You can use exclude-inline-prefixes to exclude some of them. The value of the attribute must be a space-separated list of in-scope namespace prefixes. It’s an error to refer to a prefix that doesn’t have an in-scope declaration. The tokens #default and #all may also be used to exclude the default namespace and all namespaces, respectively.

In Example 2.1, “A compound step declaration”, the $max-depth option is set on the p:directory-list using p:with-option:

  |<p:with-option name="max-depth"
  |               select="if ($recurse) then 'unbounded' else '1'"/>

2.4Compound steps

The XProc specification defines several compound steps, steps that contain subpipelines. XML Calabash also implements a couple of additional compound steps. (Pipelines that use extension compound steps are probably not, strictly speaking, conformant with the specification.)

2.4.1Looping over inputs (p:for-each)

The p:for-each step loops over sequence of documents, processing each with its subpipeline.

<p:for-each
  name? = NCNameThe step name
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   output*,
   subpipeline)
</p:for-each>
Tip

Unlike XPath, if a select expression is used on the p:with-input the nodes selected from the original document(s) are not what the loop iterates over. Instead, a whole new document is constructed for each selection and that document is processed.

This means, for example, that “@name” can’t be used to test an attribute on the loop input. You need to use “/*/@name” or something similar instead.

The p:for-each in Example 2.9, “Looping with for-each” loops over the c:file elements from the p:directory-list step, but each one will be in its own document.

  |<p:for-each name="loop">
  |  <p:with-input select="//c:file"/>
  |
  |</p:for-each>
Example 2.9Looping with for-each

In principle, a p:for-each can process its inputs in parallel. XML Calabash does not do so at this time.

2.4.2Changing internal structures (p:viewport)

The p:viewport step replaces sections of a document with the result of processing (a document containing) those sections using the subpipeline.

<p:viewport
  name? = NCNameThe step name
  match = XSLTSelectionPatternMatch pattern for content to replace
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   output?,
   subpipeline)
</p:viewport>

The p:viewport in Example 2.10, “Matching with viewport” matches each copyright element where the holder is “Someone Random” and replaces that element with the result of its subpipeline, in this case, with the contents of the document on the copyright input port.

1 |<p:viewport match="copyright[. = 'Someone Random']">
  |  <p:identity>
  |    <p:with-input pipe="copyright@main"/>
  |  </p:identity>
5 |</p:viewport>
Example 2.10Matching with viewport
Tip

Like select on p:for-each, the elements matched by p:viewport are available to the subpipeline as new documents.

2.4.3Choosing among alternatives (p:choose)

The p:choose step allows you to select processing among a number of alternatives. At most one alternative will be used: either the first p:when (in document order) for which the test expression is true or the p:otherwise.

<p:choose
  name? = NCNameThe step name
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   ((when+,
     otherwise?) |
    (when*,
     otherwise)))
</p:choose>

The example pipeline in Example 2.11, “An example choice” uses p:choose to select which stylesheet to use when formatting a document based on the document status.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:c="http://www.w3.org/ns/xproc-step"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 5 |                xmlns:ex="http://example.com/ns"
   |                name="main" version="3.1" type="ex:format">
   |  <p:input port="source"/>
   |  <p:output port="result"/>
   | 
10 |  <p:choose>
   |    <p:when test="/*/@status = 'draft'">
   |      <p:xslt>
   |        <p:with-input port="stylesheet" href="draft.xsl"/>
   |      </p:xslt>
15 |    </p:when>
   |    <p:when test="/*/@status = 'final'">
   |      <p:with-input pipe="source"/>
   |      <p:xslt>
   |        <p:with-input port="stylesheet" href="final.xsl"/>
20 |      </p:xslt>
   |    </p:when>
   |    <p:otherwise>
   |      <p:error code="ex:bad-status">
   |        <p:with-input>
25 |          <p:inline content-type="text/plain">Unexpected status</p:inline>
   |        </p:with-input>
   |      </p:error>
   |    </p:otherwise>
   |  </p:choose>
30 |</p:declare-step>
Example 2.11An example choice

Each alternative is a p:when which contains the subpipeline to run if this alternative is selected.

<p:when
  name? = NCNameThe step name
  test = XPathExpressionThe test expression
  collection? = booleanInputs as default collection?
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   output*,
   subpipeline)
</p:when>

The first (and only the first) p:when where the test expression evaluates to true is used.

Notes on the attributes of p:when
name

The step name is only used by the subpipeline in the p:when. It’s how a step in the subpipeline can refer, for example with p:pipe, to the context input.

test

The test expression. This p:when is selected if it is the first p:when, in document order, where the expression evaluates to true. It’s an error if the expression refers to the context item if there is not exactly one context item provided by the p:with-input.

collection

If the collection attribute is true, all of the documents that appear on the context binding are placed in the default collection for the expression. An expression can only refer to the context item if it is a single value, but by using the default collection, an option can handle a sequence of values.

The first p:when selects documents with a status of “draft”:

1 |<p:when test="/*/@status = 'draft'">
  |  <p:xslt>
  |    <p:with-input port="stylesheet" href="draft.xsl"/>
  |  </p:xslt>
5 |</p:when>

The second p:when selects documents with a status of “final”. This example includes an explicit binding for the input that will be used to set the context item for the test expression. It’s unnecessary as it’s the same as the default readable port in this case.

1 |<p:when test="/*/@status = 'final'">
  |  <p:with-input pipe="source"/>
  |  <p:xslt>
  |    <p:with-input port="stylesheet" href="final.xsl"/>
5 |  </p:xslt>
  |</p:when>

The p:otherwise contains the subpipeline to run if no other alternative is selected.

<p:otherwise
  name? = NCNameThe step name
  {any-name}* = stringAdditional attributes
>
  (output*,
   subpipeline)
</p:otherwise>

The p:otherwise in this example raises an error if the status is neither draft nor final:

1 |<p:otherwise>
  |  <p:error code="ex:bad-status">
  |    <p:with-input>
  |      <p:inline content-type="text/plain">Unexpected status</p:inline>
5 |    </p:with-input>
  |  </p:error>
  |</p:otherwise>

2.4.4Simple conditionals (p:if)

The p:if is a simplified form of p:choose. If the test expression is true, then the subpipeline is run and that determines the output from the step. If the expression is false, p:if operates as an identity step, passing its input through unchanged.

<p:if
  name? = NCNameThe step name
  test = XPathExpressionThe test expression
  collection? = booleanInputs as default collection?
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   output*,
   subpipeline)
</p:if>

The notes about the attributes on p:when apply to p:if.

The p:if in Example 2.12, “Using an if instruction” deletes the count attribute if it has the value “0”.

  |<p:if test="xs:integer(/doc/@count) = 0">
  |  <p:delete match="/doc/@count"/>
  |</p:if>
Example 2.12Using an if instruction

If the attribute has any other value (or isn’t present), the document passes through as if it was an p:identity step.

2.4.5Grouping (p:group)

The p:group step is just a wrapper around a subpipeline.

<p:group
  name? = NCNameThe step name
  {any-name}* = stringAdditional attributes
>
  (output*,
   subpipeline)
</p:group>

2.4.6Exception handling (p:try)

The p:try step allows a pipeline author to catch errors and recover from them. The subpipeline is run. If no errors occur, the result of the step is the result of that subpipeline. If an error does occur, each p:catch is tested in turn and the result of the step is the result of the first matching p:catch.

<p:try
  name? = NCNameThe step name
  {any-name}* = stringAdditional attributes
>
  (output*,
   subpipeline,
   ((catch+,
     finally?) |
    (catch*,
     finally)))
</p:try>

The example for p:choose raises an error if the status is neither “draft” nor “final”. We can use p:try to recover from that error and treat any such document as if it had draft status, as shown in Example 2.13, “An example try/catch”.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:c="http://www.w3.org/ns/xproc-step"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 5 |                xmlns:ex="http://example.com/ns"
   |                name="main" version="3.1">
   |  <p:import href="choose.xpl"/>
   | 
   |  <p:input port="source"/>
10 |  <p:output port="result"/>
   | 
   |  <p:try>
   |    <ex:format/>
   |    <p:catch code="ex:bad-status">
15 |      <p:xslt>
   |        <p:with-input port="source" pipe="source@main"/>
   |        <p:with-input port="stylesheet" href="draft.xsl"/>
   |      </p:xslt>
   |    </p:catch>
20 |  </p:try>
   |</p:declare-step>
Example 2.13An example try/catch

A p:catch matches an error if the error code is in its code list, or if it does not have a code attribute at all.

If there are no matching catches, the p:try step fails with the error (which may be caught and handled by some p:try among its ancestors, if it has any.)

<p:catch
  name? = NCNameThe step name
  code? = EQNameListThe error codes to catch
  {any-name}* = stringAdditional attributes
>
  (output*,
   subpipeline)
</p:catch>

Note that the default readable port inside the p:catch is the error document produced by the failed pipeline. You have to make an explicit binding if you want something else.

Irrespective of whether the subpipeline succeeds or fails and whether or not a catch is invoked (and whether or not it succeeds or fails), the p:finally subpipeline will be run.

<p:finally
  name? = NCNameThe step name
  {any-name}* = stringAdditional attributes
>
  (output*,
   subpipeline)
</p:finally>

It is very uncommon for this to be useful. One plausible use case is for the finally step to clean up any side effects that might have been introduced by the subpipeline or the catch expressions, for example, deleting a temporary file or closing a database connection.

2.4.7Loop until a condition is true (cx:until)

This is an extension compound step that processes single documents, applying its subpipeline until the test expression is true.

<cx:until
  name? = NCNameThe step name
  test = XPathExpressionThe test expression
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   output?,
   subpipeline)
</cx:until>

The test attribute specifies an XPath expression. The subpipeline is always run at least once and the condition is only tested at the end of the loop. The result of the subpipeline is provided as the context item. The previous result is provided in the variable cx:previous.

The pipeline Example 2.14, “Looping until a condition is true” demonstrates a horribly inefficient way to add explicit numbers to a list.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:c="http://www.w3.org/ns/xproc-step"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 5 |                exclude-inline-prefixes="#all"
   |                name="main" version="3.1">
   |  <p:output port="result" serialization="map{'indent':true()}"/>
   | 
   |  <p:identity name="identity">
10 |    <p:with-input>
   |      <list>
   |        <item/>
   |        <item/>
   |        <item/>
15 |      </list>
   |    </p:with-input>
   |  </p:identity>
   | 
   |  <cx:until test="deep-equal(., $cx:previous)">
20 |    <p:replace match="/list/item[1]">
   |      <p:with-input port="replacement">
   |        <li number="{p:iteration-position()}"/>
   |      </p:with-input>
   |    </p:replace>
25 |  </cx:until>
   | 
   |</p:declare-step>
Example 2.14Looping until a condition is true

The first time through the loop, the first item is replaced. The next time through, the next item is replaced, etc. Looping stops when nothing changes in the document (that is, when all the items have been replaced).

The result of the cx:until step is first document for which the test expression is true. In this case, the result is:

1 |<list>
  |   <li number="1"/>
  |   <li number="2"/>
  |   <li number="3"/>
5 |</list>

It is a dynamic error (err:XD0001) if the source is not a single document.

2.4.8Loop while a condition is true (cx:while)

This is an extension compound step that processes single documents, applying its subpipeline while the test expression is true.

<cx:while
  name? = NCNameThe step name
  test = XPathExpressionThe test expression
  {any-name}* = stringAdditional attributes
>
  (with-input?,
   output?,
   subpipeline)
</cx:while>

The somewhat contrived example in Example 2.15, “Looping while a condition is true” loops over the document adding a new first child until the count reaches zero.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:c="http://www.w3.org/ns/xproc-step"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 5 |                exclude-inline-prefixes="#all"
   |                name="main" version="3.1">
   |  <p:output port="result" serialization="map{'indent':true()}"/>
   | 
   |  <p:identity name="identity">
10 |    <p:with-input>
   |      <doc count="3"/>
   |    </p:with-input>
   |  </p:identity>
   | 
15 |  <cx:while test="/doc/@count and xs:integer(/doc/@count) gt 0">
   |    <p:insert position="first-child">
   |      <p:with-input port="insertion">
   |        <insertion for="{/doc/@count}"/>
   |      </p:with-input>
20 |    </p:insert>
   | 
   |    <p:add-attribute attribute-name="count" attribute-value="{xs:integer(/doc/@count) - 1}"/>
   | 
   |    <p:if test="xs:integer(/doc/@count) = 0">
25 |      <p:delete match="/doc/@count"/>
   |    </p:if>
   |  </cx:while>
   | 
   |</p:declare-step>
Example 2.15Looping while a condition is true

The result of the cx:while step is first document for which the test expression did not have an effective boolean value of true. In this case, the result is:

1 |<doc>
  |   <insertion for="1"/>
  |   <insertion for="2"/>
  |   <insertion for="3"/>
5 |</doc>

It is a dynamic error (err:XD0001) if the source is not a single document.

The test attribute specifies an XPath expression. The document is provided as the context item. If the expression is false, the loop is not run (or run again).

2.5Atomic steps

A great many pipelines that you write will be like shell scripts or “main” functions in other programming languages: they run, they do a thing, and they end. But in fact, with a little extra markup, every pipeline that you declare can also be reused an atomic step elsewhere. In this way, there are an unbounded number of atomic steps: there are all of the standard ones (summarized in Part I, “Standard steps”), there are all of the extension steps that ship with XML Calabash (summarized in Part II, “Extension steps”), and then there are all the steps that you write.

2.6Variables

A subpipeline may use p:variable to hold the result of a computation.

<p:variable
  name = EQNameThe variable name
  as? = XPathSequenceTypeThe required value type
  select = XPathExpressionVariable value
  collection? = booleanInputs as default collection?
  href? = { anyURI }Document URI
  pipe? = stringA pipe binding
  {any-name}* = stringAdditional attributes
  exclude-inline-prefixes? = stringA space-separated list of namespace prefixes
>
  ((empty |
    (document |
     pipe |
     inline)*) |
   anyElement*)
</p:variable>

The notes about the attributes on p:with-option apply to p:variable.

Expressions which occur later in the subpipeline may refer to the variable.

2.7Extra information

Documentation can be placed anywhere in the pipeline with the p:documentation element. It’s ignored by the processor.

<p:documentation
  {any-name}* = stringAdditional attributes
>
  any-well-formed-content*
</p:documentation>

The p:pipeinfo element is intended for additional information that a particular processor might use. XML Calabash uses them for assertions, for example.

<p:pipeinfo
  {any-name}* = stringAdditional attributes
>
  any-well-formed-content*
</p:pipeinfo>
Note

The p:documentation and p:pipeinfo elements have no special signifigance inside a p:inline; in that context, they’re just inline elements.