Chapter 2. Running XML Calabash
The easiest way to run XML Calabash is directly from the jar file:
java -jar xmlcalabash-3.0.0-alpha14.jar options…
Run this way, all of the dependencies included in the
lib
directory are automatically included on the classpath.
You can also run the main class directly, for example in a build system or a shell script:
java com.xmlcalabash.app.Main options…
But if you choose this form, you must ensure that the classpath contains both the XML Calabash jar file and all of the necessary dependencies. One feature of the second form is that it allows you to change or update the dependencies.
In the fullness of time, XML Calabash will be available through Maven which has extensive features for managing dependencies. In the short term, it’s just a bit messy.
The one library that you are likely to want to change in Saxon. If you have a license for Saxon PE or Saxon EE, it makes sense to swap out the Saxon HE library that ships with XML Calabash for PE or EE.
To save you the trouble of constructing and managing a large classpath, the
alpha distribution of XML Calabash is configured with a hack. Simply
delete the Saxon-HE-12.5.jar
file from the
lib
directory and copy in your PE or EE jar instead. You
must use a 12.5 jar file for this purpose!
Be careful to make sure there’s only one Saxon jar file in the
lib
directory. Having multiple versions of the same library on the classpath
is an invitation for subtle and mysterious crashes.
Saxon generally searches for a license file on the classpath, which isn’t going
to work with this hack. Instead, you can use the SAXON_HOME
environment
variable, as described in
the
documentation.
Elsewhere in this document, we assume that the command “xmlcalabash
”
runs XML Calabash. This can be replaced by the java -jar …
version above or
by your own script.
XML Calabash has a command driven command-line interface. It supports
three commands: run
, to run a pipeline;
version
, to print the version, and
help
to display help. If no command is
given, the run
command is assumed.
The run
command
[--input:port=uri…] [--output:port=filespec…]
[--namespace:prefix=uri…] [--init:class-name…]
[--description:description-file] [--graph:graph-file]
[--step:step-name]
[--verbosity:verbosity] [--explain] [--visualizer:name]
[--trace:output-file] [--trace-documents:output-directory]
[--assertions:level]
[--licensed] [--debug] [--debugger] [--help]
pipeline.xpl [option=value…]
Where the options and arguments are:
run
, selects the run commandIf the command is omitted,
run
is assumed.--configuration:configuration-file
, identifies a configuration fileWhen XML Calabash begins, it reads configuration settings from a configuration file. If you don’t specify a configuration file, it will search first for
.xmlcalabash3
in the current directory and then in your home directory.--input:port=uri
, identifies an inputThis option associates the resource identified by the uri with the input port named port. If the
--input
option is repeated for the same port, the resources become a sequence of documents on that port, in the order specified.The pipeline must have an input port named port.
--output:port=filespec
, identifies an outputIn the simplest case, at most one document appears on the output port and filespec is just a filename. The filespec can also be a template. The output documents are numbered, starting at 1. Within the filespec, the string “
%d
” will be replaced by the document number. (The string “%h
” will be replaced by the document number in hexadecimal and “%o
” will be replaced by the document number in octal, for no good reason.) Zeros can be added to pad the value, “%00d
” will produce a document number that is at least two digits long, padded on the left with 0’s if there are fewer than 10 documents. For completeness, a literal “%” can be added to the filename with “%%
” in the filespec.If the filespec does not contain a number template, then all of the outputs on the port will be concatenated to the same file. You may not repeat the
--output
option for the same port.The pipeline must have an output port named port.
If the pipeline writes to a port for which there is no corresponding
--output
option, the results will be written to “standard output”, usually the terminal window. When writing to the terminal:A header is printed before the output identifying the port name, document number, and base URI.
A line of equal signs is printed as a separator between documents (if more than one document is output).
When XML or HTML content is output, the XML declaration is omitted.
Indentation is turned on.
The method for determining whether output is going to a terminal or being redirected isn’t terribly sophisticated and may be wrong in some circumstances. It’s safer to use
--output
to write to a file if you want to save the output.--namespace:prefix=uri
, identifies a namespace bindingBinds the specified prefix to the uri. This has no effect on the pipeline; these bindings are only used when evaluating expressions used to define options.
If no namespace bindings are provided, the default bindings are
xs
,fn
,map
,array
, andmath
to their traditional XML Schema and XPath URIs. The prefixsaxon
is also bound to the Saxon extension namespace,http://saxon.sf.net/
.--init:class-name
, Saxon configuration initializerAttempts to load and execute the class named class-name, which must be available on the class path and must implement the
net.sf.saxon.lib.Initializer
interface. (This is analagous to the-init:
option on the Saxon command line API.)--description:description-file
, pipeline descriptionThis option writes an XML description of the pipeline and the corresponding graphs to
description-file.xml
.--graph:graph-file
, SVG graph outputsThis option writes two SVG descriptions of the pipeline. The first, named
graph-file.pipeline.svg
, is a “boxes and arrows” diagram of the pipeline(s) to run, as interpreted by XML Calabash. The second, namedgraph-file.graph.svg
, is a diagram of the graph(s) that were constructed from the pipeline(s). The XML Calabash runtime executes the graphs, not the pipelines.--step:step-name
, identifies a stepIf the input pipeline document is a
p:library
, this option identifies a step within that library to run.--verbosity:verbosity
There are five levels of verbosity. The level of verbosity determines how much detail is printed about the progress of a running pipeline.
trace
Lots of detail, show everything.
debug
Lots of detail
info
Show relevant status messages.
warn
Show only warnings and errors.
error
No messages except fatal errors
Setting the verbosity also sets the logging level.
--explain
, explain errorsEnables error explanations. If error explanations are enabled, when a pipeline fails, in addition to the error message, a short explanation of the cause will be provided.
--visualizer:name
Selects a visualizer for reporting pipeline progress. There are three defined visualizers.
silent
Shows no progress reports.
plain
Reports the name of each step when it begins running. Most steps manufactured automatically during graph construction are omitted. There is one option,
indent
which determines whether or not, and to what extent, reports are indented when they are nested inside compound steps.detail
Reports the name of each step and may report on the documents produced.
If the
steps
option istrue
, the progress of steps is recorded. (Defaults to true.)If the
documents
option istrue
, the documents produced during execution are recorded. (Defaults to false.)
Options can be specified on the command line as
key=value
pairs after the name. The name and any options are separated by a question mark (?
). Key-value pairs are separted by semicolons (;
). For example:--visualizer=detail?steps=true;documents=true
.--trace:output-file
, defines a runtime trace fileA runtime trace of the pipeline execution will be written to
output-file
. (If the--trace-documents
option is given but the--trace
option is not, the default output file is namedtrace.xml
in the--trace-documents
output directory.)For more information about tracking, see Appendix B, Tracing execution.
--trace-documents:output-directory
, defines a runtime trace directoryIf the
--trace-documents
option is given, the runtime trace will write a copy of every document that flows through the pipeline into the specified directory. These documents are identified in the--trace
file.--assertions:level
By default, assertions are disabled. You can enable them by setting an appropriate level:
error
Schematron reports are output and Schematron errors are treated as errors. (They cause the step to throw
Q{http://xmlcalabash.com/ns/error}XI0041
.)warning
(orwarn
)Schematron reports are output and Schematron errors are treated as warnings.
ignore
Assertions are ignored.
--licensed
Requests a licensed processor. In practice, a licensed processor is used by default, if one is available. However,
--licensed:false
can be used to explicitly request an unlicensed processor when Saxon PE or Saxon EE are on the classpath.Schema-aware processing requires Saxon EE and a valid Saxon license.
--debug
If debugging is enabled, the verbosity is set to at least the
debug
level and the backend logging framework’s “log level” is set to the verbosity.This produces a lot of messages, typically on the standard error output. XML Calabash uses SLF4J and Logback for logging. You can configure it in the usual ways.
--debugger
Start an interactive debugging session on the pipeline. See Chapter 5, The interactive debugger.
--help
This is equivalent to issuing the
help
command. It’s provided as an option for convenience.pipeline.xpl
, the pipeline to runThis identifies the pipeline to run. If the root element is
p:declare-step
, then that pipeline will be run. If the root element isp:library
, the first pipeline in the library will be run, unless thestep
option specifies an alternate pipeline.option=value
, sets an optionYou can provide values for pipeline options on the command line. These override any defaults declared in the pipeline. There must be a pipeline option named option.
The option name can be an EQName or it can use a prefix previously defined with
--namespace
; if the option is a simpleNCName
, it is not in a namespace.If the value begins with “?”, what follows is taken to be an XPath expression. That expression is evaluated using the namespace bindings defined. The context item is undefined. The result of evaluating the expression is the value of the option. If the value does not begin with a “?”, the whole string becomes the value as an
xs:untypedAtomic
.If multiple assignments to the same
option
appear, the option’s value will be a sequence with those values in the order specified.
The help
command
Displays a short summary of the command line options and arguments, not dissimilar to the preceding section. If help is requested, all of the other command line arguments are ignored.
The version
command
Displays the XML Calabash version and the version of Saxon:
$ xmlcalabash version
XML Calabash version 3.0.0-alpha14 (build d99cd48.191.08091a, 08 Jan 2025)
Running with Saxon HE version 12.5
Most options are ignored when the version
command is
used, but if the debug
level of --verbosity
is requested, the version summary will include details about third party
dependencies such as the HTML parser and XML resolver. In this case, the output
is formatted in a way that can more easily be parsed, for example by a shell script.
$ xmlcalabash --debug
PRODUCT_NAME=XML Calabash
VERSION=3.0.0-alpha14
BUILD_DATE=2025-01-08
BUILD_ID=d99cd48.191.08091a
SAXON_EDITION=HE
VENDOR_NAME=Norm Tovey-Walsh
VENDOR_URI=https://xmlcalabash.com/
DEPENDENCY_brotliDec=0.1.2
DEPENDENCY_commonsCodec=1.17.0
DEPENDENCY_commonsCompress=1.27.1
DEPENDENCY_flexmarkAll=0.64.8
DEPENDENCY_graalvmJS=23.1.5
DEPENDENCY_htmlparser=1.4.16
DEPENDENCY_httpClient=5.3.1
DEPENDENCY_jing=20241231
DEPENDENCY_jsonSchemaValidator=1.5.4
DEPENDENCY_saxon=12.5
DEPENDENCY_schxslt2=1.3.1
DEPENDENCY_sinclude=5.2.4
DEPENDENCY_slf4j=2.0.16
DEPENDENCY_tukaaniXz=1.10
DEPENDENCY_uuidCreator=6.0.0
DEPENDENCY_xercesImpl=2.12.2
DEPENDENCY_xmlResolver=6.0.10
These are the compile-time dependencies, the versions that the processor expected. The versions actually used are controlled by what appears on the classpath at runtime.