Chapter 2. Running XML Calabash
The easiest way to run XML Calabash is directly from the jar file:
java -jar xmlcalabash-3.0.23.jar options…Run this way, all of the dependencies included in the
lib directory are automatically included on the classpath.
You can also run the main class directly, for example in a build system or a shell script:
java com.xmlcalabash.app.Main options…But if you choose this form, you must ensure that the classpath contains both the XML Calabash jar file and all of the necessary dependencies. One feature of the second form is that it allows you to change or update the dependencies.
XML Calabash is also available
from Maven
with the coordinates com.xmlcalabash:xmlcalabash:3.0.23.
The one library that you are likely to want to change is Saxon. If you have a license for Saxon PE or Saxon EE, it makes sense to swap out the Saxon HE library that ships with XML Calabash for PE or EE.
To save you the trouble of constructing and managing a large classpath, the
distribution of XML Calabash is configured with a hack. Simply
delete the Saxon-HE-12.9.jar file from the
lib directory and copy in your PE or EE jar instead. You
must use a 12.9 jar file for this purpose!
Be careful to make sure there’s only one Saxon jar file in the
lib directory. Having multiple versions of the same library on the classpath
is an invitation for subtle and mysterious crashes.
Saxon generally searches for a license file on the classpath, which isn’t going
to work with this hack. Instead, you can use the SAXON_HOME environment
variable, as described in
the
documentation.
Elsewhere in this document, we assume that the command “xmlcalabash”
runs XML Calabash. This can be replaced by the java -jar … version above or
by your own script.
XML Calabash has a command driven command-line interface. It supports
three commands: run, to run a pipeline;
info, to print diagnostic information, and
help to display help. If no command is
given, the run command is assumed.
The run command
[--pipe] [--input:[type@]port=uri…] [--output:port=filespec…]
[--namespace:prefix=uri…] [--init:class-name…]
[--graphs:graph-output-directory]
[--verbosity:verbosity] [--explain] [--visualizer:name]
[--trace:output-file] [--trace-documents:output-directory]
[--assertions:level] [--extension:name…]
[--stacktrace] [--licensed] [--debug] [--debugger] [--nogo]
[--catalog:catalog-file-uri…] [--xml-schema:xml-schema-file-uri…]
[--validation-mode:mode] [--try-namespaces] [--use-location-hints] [--help]
[--step:step-name] [pipeline.xpl] [option=value…] [!serialparam=value…]
Where the options and arguments are:
run, selects the run commandIf the command is omitted,
runis assumed.--configuration:configuration-file, identifies a configuration fileWhen XML Calabash begins, it reads configuration settings from a configuration file. If you don’t specify a configuration file, it will search first for
.xmlcalabash3in the current directory and then in your home directory.--pipe:boolean, sets piped modeIf piped mode is enabled, XML Calabash will behave like a Unix pipeline. If there is no binding for the primary input port, it will read it from standard input. If there is no binding for the primary output port, it will write it to standard output. If not specified, the default setting comes from the
piped-ioconfiguration setting.--input:[type@]port=uri, identifies an inputThis option associates the resource identified by the uri with the input port named port. The pipeline must have an input port named port.
If type is provided, it is the content type that will be used to parse the input; it must be a valid content type. If no content type is provided, the content type will be inferred from the URI, if possible.
If the
--inputoption is repeated for the same port, the resources become a sequence of documents on that port, in the order specified.If the input uri provided is “
-”, input will be read from standard input. When standard input is read, if no type is provided, it will be parsed as the first usable content type listed on the named input port. It will be parsed as XML if the input port doesn’t specify any usable content types.ⓘNoteA content type is “usable” if it is not a negated type, if it doesn’t use wildcards for the type or subtype, or if it’s recognized as a “+xml” or “+json” content type.
--output:port=filespec, identifies an outputThis option determines how the output to port is stored. If port is omitted, the primary output port is assumed.
In the simplest case, at most one document appears on the output port and filespec is just a filename: for example,
--output:result=saved.xmlto save the output from the “result” port into a file named “saved.xml”. If the filename is not an absolute path, it is made absolute relative to the current working directory.The filespec can also be a template. The output documents are numbered, starting at 1. Within the filespec, the string “
%d” will be replaced by the document number. (The string “%x” will be replaced by the document number in hexadecimal and “%o” will be replaced by the document number in octal.) A width can be specified between the%and the format specifier. For example, “%02d” will produce a document number that is at least two digits long, padded on the left with 0’s if there are fewer than 10 documents. (If the leading0is omitted, the field will be padded with spaces;%17xspecifies a width of 17 hexadecimal digits, padded on the left with spaces.) For completeness, a literal “%” can be added to the filename with “%%” in the filespec.If the filespec does not contain a number template, then all of the outputs on the port will be concatenated to the same file. You may not repeat the
--outputoption for the same port.The pipeline must have an output port named port.
If the output filespec provided is “
-”, output from that port will be written to standard output. At most one output port may be explicitly bound to standard output.In piped mode, at most one port will write to standard output. If there is no explicit binding to standard output, the primary output port writes to standard output. If the primary output port is bound to some other output, no output will be sent to standard output.
If piped mode is not enabled, and no port is explicitly bound to standard output, the pipeline will write the output from all otherwise unbound output ports to standard output. If exactly one port is being written to standard output, and if that port cannot produce a sequence, then the output is written directly to standard output.
If more than one port may appear on standard output, or if a sequence of documents may appear, and if standard output appears to be going to a terminal window, “decoration” is added as an aid to comprehension:
A header is printed before the output identifying the port name, document number, and base URI.
A line of equal signs is printed as a separator between documents (if more than one document is output).
When XML or HTML content is output, the XML declaration is omitted.
Indentation is turned on.
The method for determining whether output is going to a terminal or being redirected isn’t terribly sophisticated and may be wrong in some circumstances. It’s safer to explicitly enable piped mode or use
--outputto write to a file if you want to save the output.--namespace:prefix=uri, identifies a namespace bindingBinds the specified prefix to the uri. This has no effect on the pipeline; these bindings are only used when evaluating the
--stepoption and in expressions used to define options and serialization parameters.The default namespace bindings on the command line are:
Prefix Namespace URI arrayhttp://www.w3.org/2005/xpath-functions/arraycxhttp://xmlcalabash.com/ns/extensionsfnhttp://www.w3.org/2005/xpath-functionsmaphttp://www.w3.org/2005/xpath-functions/mapmathhttp://www.w3.org/2005/xpath-functions/mathphttp://www.w3.org/ns/xprocsaxonhttp://saxon.sf.net/xshttp://www.w3.org/2001/XMLSchemaThe
--namespaceoption can add to, or change, these bindings.--init:class-name, Saxon configuration initializerAttempts to load and execute the class named class-name, which must be available on the class path and must implement the
net.sf.saxon.lib.Initializerinterface. (This is analogous to the-init:option on the Saxon command line API.)--graphs:graph-output-directory, SVG graph outputsThis option writes hyperlinked SVG descriptions of pipelines, and their corresponding graphs, to the graph-output-directory. The descriptions are “boxes and arrows” diagrams of the connections between the steps. One SVG diagram is produced for each declared pipeline and its corresponding graph. An HTML index in the graph-output-directory makes them easy to browse. See Chapter 4, Pipelines vs. Graphs for more details.
The processor assumes that it “owns” the graph-output-directory; it will erase files and directories before creating the graph output.
ⓘNoteSome browsers have better support for SVG than others. If the diagrams are difficult to view, or if links don’t work correctly, a good first step is to try a different browser.
--verbosity:verbosityThere are five levels of verbosity. The level of verbosity determines how much detail is printed about the progress of a running pipeline.
traceLots of detail, show everything.
debugLots of detail
infoShow relevant status messages.
warnShow only warnings and errors.
errorNo messages except fatal errors
Setting the verbosity also sets the logging level.
--explain, explain errorsEnables error explanations. If error explanations are enabled, when a pipeline fails, in addition to the error message, a short explanation of the cause will be provided.
--visualizer:nameSelects a visualizer for reporting pipeline progress. There are three defined visualizers.
silentShows no progress reports.
plainReports the name of each step when it begins running. Most steps manufactured automatically during graph construction are omitted. There is one option,
indentwhich determines whether or not, and to what extent, reports are indented when they are nested inside compound steps.detailReports the name of each step and may report on the documents produced.
If the
stepsoption istrue, the progress of steps is recorded. (Defaults to true.)If the
documentsoption istrue, the documents produced during execution are recorded. (Defaults to false.)
Options can be specified on the command line as
key=valuepairs after the name. The name and any options are separated by a question mark (?). Key-value pairs are separated by semicolons (;). For example:--visualizer=detail?steps=true;documents=true.--trace:output-file, defines a runtime trace fileA runtime trace of the pipeline execution will be written to
output-file. (If the--trace-documentsoption is given but the--traceoption is not, the default output file is namedtrace.xmlin the--trace-documentsoutput directory.)For more information about tracking, see Appendix B, Tracing execution.
--trace-documents:output-directory, defines a runtime trace directoryIf the
--trace-documentsoption is given, the runtime trace will write a copy of every document that flows through the pipeline into the specified directory. These documents are identified in the--tracefile.--assertions:levelBy default, assertions are disabled. You can enable them by setting an appropriate level:
errorSchematron reports are output and Schematron errors are treated as errors. (They cause the step to throw
Q{http://xmlcalabash.com/ns/error}XI0041.)warning(orwarn)Schematron reports are output and Schematron errors are treated as warnings.
ignoreAssertions are ignored.
--licensedRequests a licensed processor. In practice, a licensed processor is used by default, if one is available. However,
--licensed:falsecan be used to explicitly request an unlicensed processor when Saxon PE or Saxon EE are on the classpath.Schema-aware processing requires Saxon EE and a valid Saxon license.
--debugThe
--debugoption, the--verbosityoption, and the configuration of the underlying logging framework are interrelated in ways that aren’t entirely ideal. Originally, the--debugoption dynamically changed the underlying logger configuration, but that isn’t completely portable across logging frameworks, so it doesn’t try to do that anymore.If you specify
--debugthe verbosity level will be set to at least “debug” and some small details of operation change, for example, the intermediate files used to generate graphs are kept rather than deleted. Basically, it’s not much different than--verbosity:debug.Lots more information is available through logging framework, see Chapter 6, Messages and logging for a more detailed discussion. In particular, if you’re trying to understand what the processor is doing, configuring the logging back end to emit or save “debug” level messages will reveal a lot of detail.
--debuggerStart an interactive debugging session on the pipeline. See Chapter 7, The interactive debugger.
--stacktraceIf the stacktrace option is enabled, a stack trace (really a “step trace”) will be printed if the pipeline fails at runtime. This trace will show the step that failed and its ancestors.
--nogoCompile the pipeline, and produce graphs if they’re requested, but don’t actually run the pipeline.
--catalog:catalog-file-uriAdds the specified catalog to the list of XML Catalogs that are used during resource resolution.
--xml-schema:xml-schema-file-uriAdds the specified XML Schema to the validation context. This schema will be available in both implicit validation and when the
p:validate-with-xml-schemastep is used.--validation-mode:modeThis option enables implicit validation. The mode must be either “lax” or “strict”.
----try-namespacesThis option enables attempting to retrieve a schema using its namespace URI during implicit validation.
--use-location-hintsThis option enables attempting to retrieve a schema using location hints during implicit validation.
--extension:nameThis option enables the extension named name.
--helpThis is equivalent to issuing the
helpcommand. It’s provided as an option for convenience.--step:step, identifies a stepThe
--stepoption is a little bit overloaded. Its interpretation depends on whether or not a pipeline is specified.If a pipeline is specified, it must identify a
p:library. The value of the--stepoption is interpreted as the name of a step in that library. The named step will be run.For example:
$ xmlcalabash --step:helloWorld --library:examples.xpl …If no pipeline is specified, the value of the
--stepoption is interpreted as the type of a step. The (atomic) step with the specified type will be run. All of the inputs, outputs, and options specified apply to that step.For example:
$ xmlcalabash --step:p:xslt …The
pandcxnamespaces are bound by default. The--namespaceoption can be used to change the namespace bindings.pipeline.xpl, the pipeline to runThis identifies the pipeline to run. If the root element is
p:declare-step, then that pipeline will be run. If the root element isp:library, the first pipeline in the library will be run, unless thestepoption specifies an alternate pipeline.option=value, sets an optionYou can provide values for pipeline options on the command line. These override any defaults declared in the pipeline. There must be a pipeline option (or a static option) named option.
If the option name includes “::”, then it is treated as a shortcut for setting the value in a map. In this case, the name before the “::” is the name of the actual parameter and the name after the “::” is the name of a key in that map. (With the additional special case that if the name begins with “::”, then the option name is taken to be the default option name, currently
parameters.) This is intended to be somewhat reminiscent of XPath axes.In both cases, the name can be an EQName or it can use a prefix previously defined with
--namespace; if the name is a simpleNCName, it is not in a namespace.If the value begins with “?”, what follows is taken to be an XPath expression. That expression is evaluated using the namespace bindings defined. The context item is undefined. The result of evaluating the expression is the value of the option. If the value does not begin with a “?”, the whole string becomes the value as an
xs:untypedAtomic.If multiple assignments to the same
option(or map item) appear, the value will be a sequence with those values in the order specified.For example, suppose that the following sequences of options are given:
a=1 b=2Sets the option
ato 1 and the optionbto 2.a=1 b=2+3Sets the option
ato 1 and the optionbto “2+3”.a=1 a=2 b=?2+3Sets the option
ato (1, 2) and the optionbto 5.a=1 parameters="map{'key': 'value'}"Sets the option
ato 1 and the optionparametersto a map with the QName key “key” and the value “value”.a=1 parameters::key=valueSets the option
ato 1 and the optionparametersto a map with the QName key “key” and the value “value”.a=1 parameters::key=value serialization::method=xml serialization::indent=trueSets the option
ato 1 and the optionparametersto a map with the QName key “key” and the value “value” and the optionserializationto a map with the QName key “method” with the value “xml” and the QName key “indent” with the value “true”.a=1 ::key=value serialization::method=xml serialization::indent=trueSets the option
ato 1 and the optionparametersto a map with the QName key “key” and the value “value” and the optionserializationto a map with the QName key “method” with the value “xml” and the QName key “indent” with the value “true”.
Note that the use of
serializationin these examples assumes that the step being run has an option namedserialization. This doesn’t otherwise have any effect on the serialization options of any output port.!serialparam=value, sets a serialization parameterYou can provide values for serialization parameters on the command line. These override any defaults declared in the pipeline. The serialparam name must be the name of a serialization parameter.
Two forms are used:
portname::serialparam=valueIn this form, the
serialparamapplies to the port named portname. It is an error if the pipeline has no port named portname.serialparam=valueIn this form, the
serialparamapplies to the primary output port. It is an error if the pipeline has no primary output port.
In both cases, the name can be an EQName or it can use a prefix previously defined with
--namespace; if the name is a simpleNCName, it is not in a namespace.If the value begins with “?”, what follows is taken to be an XPath expression. That expression is evaluated using the namespace bindings defined. The context item is undefined. The result of evaluating the expression is the value of the option. If the value does not begin with a “?”, the whole string becomes the value as an
xs:untypedAtomic.If multiple assignments to the same
serialparamappear, only the last value is used.For example, suppose that the following sequences of options are given:
!method=xml !indent=trueSets the output method on the primary result port to “xml” and enables indentation.
!method=xml !indent=true !secondary::indent=falseSets the output method on the primary result port to “xml” and enables indentation. Disables indentation on the
secondaryoutput port.!indent=true !result:indent=falseSets the indent true on the primary output port and false on the port named
result. If those are the same port, indentation will be disabled.
It is an error if the parameter value is not an atomic value.
The help command
Displays a short summary of the command line options and arguments, not dissimilar to the preceding section. If help is requested, all of the other command line arguments are ignored.
The info command
(There used to be a version command; it’s been generalized
into the info command; “version” still works as a command, it’s
a synonym for “info version”.
The info version command
Displays the XML Calabash version and the version of Saxon:
$ xmlcalabash info versionXML Calabash version 3.0.23 (build 312e5ca0.19a.1b132c, 27 Oct 2025)Running with Saxon HE version 12.9 using at most 1 of 12 available threads
Most options are ignored when the version command is
used, but if the debug level of --verbosity
is requested, the version summary will include details about third party
dependencies such as the HTML parser and XML resolver. In this case, the output
is formatted in a way that can more easily be parsed, for example by a shell script.
$ xmlcalabash info version --debugPRODUCT_NAME=XML CalabashVERSION=3.0.23BUILD_DATE=2025-10-27BUILD_ID=312e5ca0.19a.1b132cSAXON_EDITION=HEVENDOR_NAME=Norm Tovey-WalshVENDOR_URI=https://xmlcalabash.com/THREADS=1MAX_THREADS=12ch.qos.logback:logback-classic=1.5.18com.fasterxml.jackson.dataformat:jackson-dataformat-csv=2.19.2com.fasterxml.jackson.dataformat:jackson-dataformat-toml=2.19.2com.fasterxml.jackson.dataformat:jackson-dataformat-yaml=2.19.2com.github.f4b6a3:uuid-creator=6.1.1com.networknt:json-schema-validator=1.5.8com.nwalsh:sinclude=5.5.0com.vladsch.flexmark:flexmark-all=0.64.8commons-codec:commons-codec=1.19.0javax.activation:activation=1.1.1name.dmaus.schxslt:schxslt2=1.3.1net.sf.saxon:Saxon-HE=12.9nu.validator:htmlparser=1.4.16org.apache.commons:commons-compress=1.28.0org.apache.httpcomponents.client5:httpclient5=5.5org.apache.logging.log4j:log4j-to-slf4j=2.25.1org.brotli:dec=0.1.2org.jetbrains.kotlin:kotlin-reflect=2.1.20org.jetbrains.kotlin:kotlin-stdlib=2.1.20org.jetbrains.kotlinx:kotlinx-coroutines-core=1.10.2org.jline:jline=3.30.4org.jline:jline-terminal-jansi=3.30.4org.nineml:coffeefilter=3.2.9org.nineml:coffeegrinder=3.2.9org.relaxng:jing=20241231org.slf4j:slf4j-api=2.0.17org.tukaani:xz=1.10org.xmlresolver:xmlresolver=6.0.19
These are the compile-time dependencies, the versions that the processor expected. The versions actually used are controlled by what appears on the classpath at runtime.
Any enabled extensions are also identified by the version command.
The info mimetypes command
Displays the MIME types that have been registered by default, by the user’s configuration file, and by extension steps. There may be additional mime types defined by Java using a .mime.types file or other mechanism. The underlying mapping doesn’t provide any API for enumerating the mappings, so there’s no way to list all of them.
$ xmlcalabash info mimetypesFilename extension/content type mappings defined by XML Calabash:.7z is application/x-7z-compressed.a is application/x-archive.arj is application/x-arj.bmp is image/bmp.bz2 is application/bzip2….yaml is application/x-yaml.yml is application/x-yaml.zip is application/zipAdditional mappings may have been defined in the JVM, for example with a .mime.types file.See https://docs.oracle.com/javase/7/docs/api/javax/activation/MimetypesFileTypeMap.htmlUse the 'info mimetype <ext>' command to query the content type of a particular <ext>.
The info mimetype command
Returns information about the content type of files (or URIs) that end with .extension.
$ xmlcalabash info mimetype xsltFilename extension/content type mapping:.xslt is application/xslt+xml
The query is against all of the defined mimetypes, both those defined by XML Calabash and those defined by Java using a .mime.types file or other mechanism.