Chapter 1Installation

XML Calabash version 3.0.0-alpha14 is still very much an alpha release. While it’s still in alpha/beta, it’s not being published anywhere except on the releases page. That release is the command line application. In the future, library releases will be published as well.

Download the latest release and unzip it on your filesystem. That will create a xmlcalabash-3.0.0-alpha14/ directory containing: README.md, xmlcalabash-3.0.0-alpha14.jar, and a lib/ directory containing lots of jar files. The application ships with all of the dependencies nesssary to run all of the steps, including the extension steps.

System configuration

All JVM applications are sensitive to some aspects of your system configuration. For example, what version of the Java Virtual Machine you’re using and if you need to use a proxy to connect to the internet.

MIME Types

An XProc processor is especially sensitive to the way MIME types are configured. An XProc pipeline can process resources of any type: XML, JSON, ZIP, etc. The content type of a resource accessed through HTTP (or HTTPS) is always identified by the server. But the content type of a resource loaded from the filesystem depends on its filename extension and how your JVM is configured.

MIME type, media type, or content type?

Yes. To quote from the Wikipedia article on media types:

The IANA and IETF use the term “media type”, and consider the term "MIME type" to be obsolete, since media types have become used in contexts unrelated to email, such as HTTP. By contrast, the WHATWG continues to use the term “MIME type” and discourages use of the term “media type” as ambiguous, since it is used with a different meaning in connection with the CSS @media feature.

The HTTP response header for providing the media type is Content-Type. The W3C has used ContentType as an XML data-type name for a media type. XDG specifications implemented by Linux desktop environments continue to use the term “MIME type”.

Following the lead of W3C with respect to XML, XProc consistently uses the term “content type”. In this user guide the term “MIME type” is used where that’s consistent with what the JVM documentation uses.

When the JVM starts, it looks for a file named “.mime.types” in the users’s home directory*. If it finds one, it uses it to build a mapping from filename extensions to MIME types. The .mime.types file is a text file where each line consists of a MIME type followed by a space separated list of filename extensions. For example:

 1 |# MIME type mapping (Likes that start with “#” are comments.)
   |application/json json
   |application/nvdl+xml nvdl
   |application/relax-ng-compact-syntax rnc
 5 |application/relax-ng+xml rng
   |application/schematron+xml sch
   |text/plain text txt css
   |application/xml xml xpl fo
   |application/xquery xq xqy
10 |application/xsd+xml xsd
   |application/xslt+xml xsl xslt

For XML Calabash, you can also define them in the configuration file.

If there is no MIME type defined for a particular extension, it will be identified as an “application/octet-stream” resource. That’s binary. It is very explicitly not XML or HTML or JSON or text. Most steps will reject binary resources. This will result in errors that might be very confusing if you’re not aware of the problem. To avoid that, XML Calabash takes a heavy-handed approach.

After the MIME types have been configured, from the system, from the user’s .mime.types file, and from the configuration file, if any of the extensions listed in Table 1.1, “Default MIME type mappings” are identified as “application/octet-stream” resources, XML Calabash defines a pragmatically more useful default.

Table 1.1Default MIME type mappings
ExtensionDefault MIME type
7zapplication/x-7z-compressed
aapplication/x-archive
arjapplication/x-arj
bmpimage/bmp
bz2application/bzip2
cpioapplication/x-cpio
csstext/plain
epsimage/eps
epubapplication/epub+zip
foapplication/xml
gifimage/gif
gzapplication/gzip
gzipapplication/gzip
jarapplication/java-archive
jpg, jpegimage/jpeg
jsonapplication/json
lzmaapplication/lzma
nvdlapplication/nvdl+xml
pdfapplication/pdf
rncapplication/relax-ng-compact-syntax
rngapplication/relax-ng+xml
schapplication/schematron+xml
svgimage/svg+xml
tarapplication/x-tar
texttext/plain
txttext/plain
xmlapplication/xml
xplapplication/xml
xqapplication/xquery
xqyapplication/xquery
xsdapplication/xsd+xml
xslapplication/xslt+xml
xsltapplication/xslt+xml
xzapplication/xz
zipapplication/zip

If these defaults are problematic, use one of the existing configuration mechanisms to define the mapping you prefer.

Also, if you have a different convention, perhaps using the extension “.schematron” for Schematron files or “.xs” for XML Schema documents, you will have to provide that mapping yourself. You will also want to provide mappings for other extensions you use.