Chapter 1. Installation
XML Calabash version 3.0.8 is still very much an alpha release. While it’s still in alpha/beta, it’s not being published anywhere except on the releases page. That release is the command line application. In the future, library releases will be published as well.
Download the latest release and unzip it on your filesystem. That will
create a xmlcalabash-3.0.8/
directory containing: README.md
,
xmlcalabash-3.0.8.jar
, and
a lib/
directory containing
lots of jar files. The application ships with all of the dependencies
nesssary to run all of the steps, including the extension steps.
System configuration
All JVM applications are sensitive to some aspects of your system configuration. For example, what version of the Java Virtual Machine you’re using and if you need to use a proxy to connect to the internet.
MIME Types
An XProc processor is especially sensitive to the way MIME types are configured. An XProc pipeline can process resources of any type: XML, JSON, ZIP, etc. The content type of a resource accessed through HTTP (or HTTPS) is always identified by the server. But the content type of a resource loaded from the filesystem depends on its filename extension and how your JVM is configured.
Yes. To quote from the Wikipedia article on media types:
The IANA and IETF use the term “media type”, and consider the term "MIME type" to be obsolete, since media types have become used in contexts unrelated to email, such as HTTP. By contrast, the WHATWG continues to use the term “MIME type” and discourages use of the term “media type” as ambiguous, since it is used with a different meaning in connection with the CSS
@media
feature.The HTTP response header for providing the media type is
Content-Type
. The W3C has usedContentType
as an XML data-type name for a media type. XDG specifications implemented by Linux desktop environments continue to use the term “MIME type”.
Following the lead of W3C with respect to XML, XProc consistently uses the term “content type”. In this user guide the term “MIME type” is used where that’s consistent with what the JVM documentation uses.
When the JVM starts, it looks for a file named
“.mime.types
” in the users’s home directory*. If it finds one, it uses it to build a
mapping from filename extensions to MIME types. The
.mime.types
file is a text file where each line consists of
a MIME type followed by a space separated list of filename extensions. For
example:
1 |# MIME type mapping (Likes that start with “#” are comments.)
|application/json json
|application/nvdl+xml nvdl
|application/relax-ng-compact-syntax rnc
5 |application/relax-ng+xml rng
|application/schematron+xml sch
|text/plain text txt css
|application/xml xml xpl fo
|application/xquery xq xqy
10 |application/xsd+xml xsd
|application/xslt+xml xsl xslt
For XML Calabash, you can also define them in the configuration file.
If there is no MIME type defined for a particular extension, it will be identified as an “application/octet-stream” resource. That’s binary. It is very explicitly not XML or HTML or JSON or text. Most steps will reject binary resources. This will result in errors that might be very confusing if you’re not aware of the problem. To avoid that, XML Calabash takes a heavy-handed approach.
After the MIME types have been configured, from the system, from the
user’s .mime.types
file, and from the configuration file,
if any of the extensions listed in Table 1.1, “Default MIME type mappings” are identified as “application/octet-stream”
resources, XML Calabash defines a pragmatically more useful default.
Extension | Default MIME type |
---|---|
.7z | application/x-7z-compressed |
.a | application/x-archive |
.arj | application/x-arj |
.bmp | image/bmp |
.bz2 | application/bzip2 |
.cpio | application/x-cpio |
.css | text/plain |
.csv | text/csv |
.dtd | application/xml-dtd |
.epub | application/epub+zip |
.fo | application/xml |
.gz | application/gzip |
.gzip | application/gzip |
.jar | application/java-archive |
.json | application/json |
.jsonld | application/ld+json |
.lzma | application/lzma |
.md | text/markdown |
.n3 | text/n3 |
.nq | application/n-quads |
.nt | application/n-triples |
.nvdl | application/nvdl+xml |
application/pdf | |
.rdf | application/rdf+xml |
.rj | application/rdf+json |
.rnc | application/relax-ng-compact-syntax |
.rng | application/relax-ng+xml |
.rq | application/sparql-query |
.sch | application/schematron+xml |
.srj | application/sparql-results+json |
.srx | application/sparql-results+xml |
.svg | image/svg+xml |
.tar | application/x-tar |
.thrift | application/rdf+thrift |
.toml | application/toml |
.trig | application/trig |
.trix | application/trix+xml |
.ttl | text/turtle |
.xml | application/xml |
.xpl | application/xproc+xml |
.xq | application/xquery |
.xql | application/xquery |
.xqm | application/xquery |
.xquery | application/xquery |
.xqy | application/xquery |
.xsd | application/xsd+xml |
.xsl | application/xslt+xml |
.xslt | application/xslt+xml |
.xz | application/xz |
.yaml | application/x-yaml |
.yml | application/x-yaml |
.zip | application/zip |
Extensions to XML Calabash may also update this table. To find out the mappings actually in effect, use the info mimetypes or info mimetype commands to check the mappings in effect on your system. If the defaults are problematic, you can override them with one of the existing configuration mechanisms.
Also, if you have a different convention, perhaps using the extension
“.schematron
” for Schematron files or “.xs
” for
XML Schema documents, you can provide that mapping as well. You will also
want to provide mappings for other extensions you use.
Extension features
XML Calabash supports several extensions, features that are in addition to or different from the XProc specification:
- Implicit validation
Automatically performs XML Schema validation on the primary input ports. See Chapter 4, Implicit validation.
- Pipeline assertions
Schematron assertions can be applied automatically to step inputs and outputs. See Chapter 9, Pipeline assertions.
- Eager URI resolution
There is (or was at the time of this release) an open issue about precisely how and when
xs:anyURI
typed values should be resolved. There is an issue about it, and a discussion thread about it on the xproc-dev mailing list. See below.
Eager URI resolution
The status quo is that relative URI values (that is, values with an explicit type
of xs:anyURI
) are resolved “late”, at the point of use,
not at the point of definition. The canonical example is this:
There is a library in file:///path/to/location/lib/lib.xpl
:
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| xmlns:s="http://example.com/ns/steps"
| name="main" version="3.1" type="s:libidentity"
5 | exclude-inline-prefixes="s">
| <p:output port="result"/>
| <p:option name="file" as="xs:anyURI"/>
|
| <p:load>
10 | <p:with-option name="href" select="$file"/>
| </p:load>
|
|</p:declare-step>
The pipeline file:///path/to/location/pipe.xpl
imports this library and uses it:
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| xmlns:s="http://example.com/ns/steps"
| name="main" version="3.1"
5 | exclude-inline-prefixes="s">
| <p:import href="lib/lib.xpl"/>
| <p:output port="result"/>
|
| <s:libidentity file="document.xml"/>
10 |</p:declare-step>
If you run this pipeline with the status quo interpretation, the relative URI
document.xml
is passed to s:libidentity
. The relative URI is
made absolute by the p:load
step against the base URI of the
p:with-option
element, resulting in an attempt to load
file:///path/to/location/lib/document.xml
.
Not withstanding the fact that all known implementations work this way, the specification seems to say that it is incorrect. In §14.3, we find:
If a relative URI appears in an option of type
xs:anyURI
, the base URI against which it must be made absolute is the base URI of thep:option
element. If the option value is specified using a syntactic shortcut, the base URI of the step element on which the shortcut attribute appears must be used. In general, whenever a relative URI appears in anxs:anyURI
, its base URI is the base URI of the nearest ancestor element.
If you run the pipeline above with this interpretation, the relative URI
document.xml
is made absolute against the s:libidentity
step
on which it occurs. The absolute URI is passed
to the p:load
step resulting in an attempt to load
file:///path/to/location/document.xml
.
If the “eager-uri-resolution” feature is enabled (on the command line or in a configuration file), XML Calabash attempts to implement this behavior:
If an expression has a type derived from
xs:anyURI
, if its value is not the empty sequence, and if there is an in-scope base URI at the point where the expression is defined, the result of evaluating the expression will immediately be made absolute with respect to that base URI.
With this implementation (and a tweak to p:namespace-rename
described below), all of the test suite tests still pass. This isn’t surprising,
because there are no tests that explicitly attempt to test this behavior and
almost all options of type xs:anyURI
are explicilty defined to
behave this way in the specifications.
Nevertheless, it has a couple of consequences:
It is no longer possible to pass around relative URIs in options or variables with the type
xs:anyURI
. Usexs:string
if you wish to pass around relative URIs.The
from
andto
options onp:namespace-rename
are defined asxs:string
instead ofxs:anyURI
. (This isn’t, in any practical way, a user-visible change.)
It’s difficult to assess the consequences of this change. For most users, it will only effect a small percentage of pipelines:
The pipeline must make use of a user defined step.
It must have an option explicitly declared as the type
xs:anyURI
.When the user defined step is called, a relative URI must be passed as the value of the option.
And the base URI of the calling step and the called step must be different.
I would be very much like to hear from users about any pipelines that they have where enabling the “eager-uri-resolution” feature changes the behavior of the pipeline.