Chapter 1. Installation

XML Calabash version 3.0.10 is available from the releases page and via Maven Central using the coordinates:

com.xmlcalabash:xmlcalabash:3.0.10

Download the latest release and unzip it on your filesystem. That will create a xmlcalabash-3.0.10/ directory containing: README.md, xmlcalabash-3.0.10.jar, and a lib/ directory containing lots of jar files. The application ships with all of the dependencies nesssary to run all of the steps, including the extension steps.

System configuration

All JVM applications are sensitive to some aspects of your system configuration. For example, what version of the Java Virtual Machine you’re using and if you need to use a proxy to connect to the internet.

MIME Types

An XProc processor is especially sensitive to the way MIME types are configured. An XProc pipeline can process resources of any type: XML, JSON, ZIP, etc. The content type of a resource accessed through HTTP (or HTTPS) is always identified by the server. But the content type of a resource loaded from the filesystem depends on its filename extension and how your JVM is configured.

ⓘ

MIME type, media type, or content type?

Yes. To quote from the Wikipedia article on media types:

The IANA and IETF use the term “media type”, and consider the term "MIME type" to be obsolete, since media types have become used in contexts unrelated to email, such as HTTP. By contrast, the WHATWG continues to use the term “MIME type” and discourages use of the term “media type” as ambiguous, since it is used with a different meaning in connection with the CSS @media feature.
The HTTP response header for providing the media type is Content-Type. The W3C has used ContentType as an XML data-type name for a media type. XDG specifications implemented by Linux desktop environments continue to use the term “MIME type”.

Following the lead of W3C with respect to XML, XProc consistently uses the term “content type”. In this user guide the term “MIME type” is used where that’s consistent with what the JVM documentation uses.

When the JVM starts, it looks for a file named “.mime.types” in the users’s home directory^*. If it finds one, it uses it to build a mapping from filename extensions to MIME types. The .mime.types file is a text file where each line consists of a MIME type followed by a space separated list of filename extensions. For example:

 1 |# MIME type mapping (Likes that start with “#” are comments.)
   |application/json json
   |application/nvdl+xml nvdl
   |application/relax-ng-compact-syntax rnc
 5 |application/relax-ng+xml rng
   |application/schematron+xml sch
   |text/plain text txt css
   |application/xml xml xpl fo
   |application/xquery xq xqy
10 |application/xsd+xml xsd
   |application/xslt+xml xsl xslt

For XML Calabash, you can also define them in the configuration file.

If there is no MIME type defined for a particular extension, it will be identified as an “application/octet-stream” resource. That’s binary. It is very explicitly not XML or HTML or JSON or text. Most steps will reject binary resources. This will result in errors that might be very confusing if you’re not aware of the problem. To avoid that, XML Calabash takes a heavy-handed approach.

After the MIME types have been configured, from the system, from the user’s .mime.types file, and from the configuration file, if any of the extensions listed in Table 1.1, “Default MIME type mappings” are identified as “application/octet-stream” resources, XML Calabash defines a pragmatically more useful default.

Table 1.1. Default MIME type mappings

Extension	Default MIME type
.7z	application/x-7z-compressed
.a	application/x-archive
.arj	application/x-arj
.bmp	image/bmp
.bz2	application/bzip2
.cpio	application/x-cpio
.css	text/plain
.csv	text/csv
.dtd	application/xml-dtd
.epub	application/epub+zip
.fo	application/xml
.gz	application/gzip
.gzip	application/gzip
.jar	application/java-archive
.json	application/json
.jsonld	application/ld+json
.lzma	application/lzma
.md	text/markdown
.n3	text/n3
.nq	application/n-quads
.nt	application/n-triples
.nvdl	application/nvdl+xml
.pdf	application/pdf
.rdf	application/rdf+xml
.rj	application/rdf+json
.rnc	application/relax-ng-compact-syntax
.rng	application/relax-ng+xml
.rq	application/sparql-query
.sch	application/schematron+xml
.srj	application/sparql-results+json
.srx	application/sparql-results+xml
.svg	image/svg+xml
.tar	application/x-tar
.thrift	application/rdf+thrift
.toml	application/toml
.trig	application/trig
.trix	application/trix+xml
.ttl	text/turtle
.xspec	application/xml
.xml	application/xml
.xpl	application/xproc+xml
.xq	application/xquery
.xql	application/xquery
.xqm	application/xquery
.xquery	application/xquery
.xqy	application/xquery
.xsd	application/xsd+xml
.xsl	application/xslt+xml
.xslt	application/xslt+xml
.xz	application/xz
.yaml	application/x-yaml
.yml	application/x-yaml
.zip	application/zip

Extensions to XML Calabash may also update this table. To find out the mappings actually in effect, use the info mimetypes or info mimetype commands to check the mappings in effect on your system. If the defaults are problematic, you can override them with one of the existing configuration mechanisms.

Also, if you have a different convention, perhaps using the extension “.schematron” for Schematron files or “.xs” for XML Schema documents, you can provide that mapping as well. You will also want to provide mappings for other extensions you use.

Extension features

XML Calabash supports several extensions, features that are in addition to or different from the XProc specification:

Implicit validation: Automatically performs XML Schema validation on the primary input ports. See Chapter 4, Implicit validation.
Pipeline assertions: Schematron assertions can be applied automatically to step inputs and outputs. See Chapter 9, Pipeline assertions.
Eager URI resolution: There is (or was at the time of this release) an open issue about precisely how and when xs:anyURI typed values should be resolved. There is an issue about it, and a discussion thread about it on the xproc-dev mailing list. See below.

Eager URI resolution

The status quo is that relative URI values (that is, values with an explicit type of xs:anyURI) are resolved “late”, at the point of use, not at the point of definition. The canonical example is this:

There is a library in file:///path/to/location/lib/lib.xpl:

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
   |                xmlns:s="http://example.com/ns/steps"
   |                name="main" version="3.1" type="s:libidentity"
 5 |                exclude-inline-prefixes="s">
   |  <p:output port="result"/>
   |  <p:option name="file" as="xs:anyURI"/>
   | 
   |  <p:load>
10 |    <p:with-option name="href" select="$file"/>
   |  </p:load>
   | 
   |</p:declare-step>

The pipeline file:///path/to/location/pipe.xpl imports this library and uses it:

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema" 
   |                xmlns:s="http://example.com/ns/steps"
   |                name="main" version="3.1"
 5 |                exclude-inline-prefixes="s">
   |  <p:import href="lib/lib.xpl"/>
   |  <p:output port="result"/>
   | 
   |  <s:libidentity file="document.xml"/> 
10 |</p:declare-step>

If you run this pipeline with the status quo interpretation, the relative URI document.xml is passed to s:libidentity. The relative URI is made absolute by the p:load step against the base URI of the p:with-option element, resulting in an attempt to load file:///path/to/location/lib/document.xml.

Not withstanding the fact that all known implementations work this way, the specification seems to say that it is incorrect. In §14.3, we find:

If a relative URI appears in an option of type xs:anyURI, the base URI against which it must be made absolute is the base URI of the p:option element. If the option value is specified using a syntactic shortcut, the base URI of the step element on which the shortcut attribute appears must be used. In general, whenever a relative URI appears in an xs:anyURI, its base URI is the base URI of the nearest ancestor element.

If you run the pipeline above with this interpretation, the relative URI document.xml is made absolute against the s:libidentity step on which it occurs. The absolute URI is passed to the p:load step resulting in an attempt to load file:///path/to/location/document.xml.

If the “eager-uri-resolution” feature is enabled (on the command line or in a configuration file), XML Calabash attempts to implement this behavior:

If an expression has a type derived from xs:anyURI, if its value is not the empty sequence, and if there is an in-scope base URI at the point where the expression is defined, the result of evaluating the expression will immediately be made absolute with respect to that base URI.

With this implementation (and a tweak to p:namespace-rename described below), all of the test suite tests still pass. This isn’t surprising, because there are no tests that explicitly attempt to test this behavior and almost all options of type xs:anyURI are explicilty defined to behave this way in the specifications.

Nevertheless, it has a couple of consequences:

It is no longer possible to pass around relative URIs in options or variables with the type xs:anyURI. Use xs:string if you wish to pass around relative URIs.
The from and to options on p:namespace-rename are defined as xs:string instead of xs:anyURI. (This isn’t, in any practical way, a user-visible change.)

It’s difficult to assess the consequences of this change. For most users, it will only effect a small percentage of pipelines:

The pipeline must make use of a user defined step.
It must have an option explicitly declared as the type xs:anyURI.
When the user defined step is called, a relative URI must be passed as the value of the option.
And the base URI of the calling step and the called step must be different.

I would be very much like to hear from users about any pipelines that they have where enabling the “eager-uri-resolution” feature changes the behavior of the pipeline.

Prev	Up	Next
XML Calabash User Guide	Home	Chapter 2. Running XML Calabash