Appendix A. Configuration
XML Calabash can read a configuration file to establish some default settings.
The configuration file is an XML document. All of the elements in the configuration file
must be in the https://xmlcalabash.com/ns/configuration namespace.
The conventional prefix for this namespace in the documentation is cc:.
Starting with XML Calabash 3.0.33, it is possible to set all of the run options in a configuration file. If a particular option is specified in the configuration file and on the command line, the command line value is used.
cc:xml-calabash
The document element of the configuration file is cc:xml-calabash:
<cc:xml-calabash xmlns:cc="https://xmlcalabash.com/ns/configuration"
default-xquery-processor? = anyURI
licensed? = boolean
line-numbering? = boolean
piped-io? = boolean
saxon-configuration? = string
stacktrace? = boolean
try-namespaces? = boolean
use-location-hints? = boolean
validation-mode? = strict|lax
verbosity? = trace|debug|info|warn|error
version? = 1.0>
(cc:catalog |
cc:extension |
cc:graphviz |
cc:initializer |
cc:inline |
cc:mimetype |
cc:namespace |
cc:paged-media |
cc:pipeline |
cc:proxy |
cc:saxon-configuration-property |
cc:send-mail |
cc:serialization |
cc:system-property |
cc:message-reporter |
cc:visualizer |
cc:threading |
cc:xml-schema |
cc:xquery-processor |
any-name)*
</cc:xml-calabash>
default-xquery-processor(URI)Identifies the default XQuery processor. If unspecified, the default is
https://saxonica.com/, the Saxonica processor. Other processors can be specified, but they must also be implemented and added to the classpath.licensed(boolean)If
true, a licensed Saxon configuration will be requested. In practice, a licensed processor is used by default, if one is available. However, setting this property tofalsewill explicitly request an unlicensed processor when Saxon PE or Saxon EE are on the classpath.This can also be specified on the command line. The command-line setting takes precedence.
Schema-aware processing requires Saxon EE and a valid Saxon license.
line-numbering(boolean)If
true, line numbers are preserved in parsed documents. Prior to version 3.0.33, XML Calabash always preserved line numbers, but that makes each tree model larger in memory and they are generally unused. There’s one common exception: validation steps that operate on the tree model. RELAX NG validation, for example, will only be able to show error locations if the initial tree model was constructed with line numbers preserved. This can be done selectively with thecx:line-numbersparameter on thep:loadstep (orp:document), for example:|
<p:document href="some-input.xml"|parameters="map{'cx:line-numbering':true()}"/>ⓘWhy is this different in XML Calabash?If you’re familiar with using a validation tool from the command line, the line numbering distinction made here may be perplexiing. When you run a validator on the command line, the validator parses the document. Each element has a location and the validator gets those locations. When you parse a document with
p:documentorp:load, that’s also true. That’s why you always get error location information for a well-formedness error, those are caught by the parser.But if you don’t enable line numbering, the location information is not preserved in the tree constructed by the parser. This reduces the amount of memory needed to store the tree and, except for a couple of extension functions, you can’t usually tell.
Except that when you pass a document to a validation step, the step starts with the tree. If the location information hasn’t been preserved in the tree, XML Calabash has no way to inform the validator about the locations of the elements.
Unless you have very large documents or a small amount of memory, it’s probably reasonable to just leave line numbering on by default.
piped-io(boolean)If
piped-ioistrue, XML Calabash will behave like a Unix pipeline. If there is no binding for the primary input port, it will read it from standard input. If there is no binding for the primary output port, it will write it to standard output. Irrespective of this setting, an explicit binding to standard input is possible with the--inputoption and an explicit binding to standard output is possible with--outputoption.When standard input is read, it will be parsed as the (first) content type listed on the primary input port. It will be parsed as XML if the primary input port doesn’t specify any content types.
saxon-configuration(filename)The filename of a Saxon configuration file. This file will be loaded to initialize the Saxon configuration.
try-namespaces(boolean)If true, implicit validation will attempt to retrieve the schema using the namespace URI. This can also be specified on the command line. The command-line setting takes precedence.
use-location-hints(boolean)If true, implicit validation will use location hints to locate schemas. This can also be specified on the command line. The command-line setting takes precedence.
validation-mode(“lax” or “strict”)Specifies a validation mode for implicit validation. This can also be specified on the command line. The command-line setting takes precedence.
verbosityThe default “verbosity” setting. This can also be specified on the command line. The command-line setting takes precedence.
version(string)The configuration file version, must be
1.0.console-output-encodingThis attribute was removed in XML Calabash version 3.0.11. See Section 1.2, “The console encoding”.
For simplicity, the content model of cc:xml-calabash allows every element to
occur an arbitrary number of times. Where an element defines a single, global setting, the last
value in document order applies.
cc:catalog
Adds the specified catalog to the list of XML Catalogs used during resource resolution.
<cc:catalog xmlns:cc="https://xmlcalabash.com/ns/configuration"
href? = anyURI />
href(URI)Location of the catalog file.
cc:extension
Enables the named extension.
<cc:extension xmlns:cc="https://xmlcalabash.com/ns/configuration"
name = string />
Only one extension name is recognized: eager-uri-resolution.
See Eager URI resolution.
cc:graphviz
Identifies the location of the Graphviz executable. Making SVG diagrams of pipelines or graphs requires Graphviz.
<cc:graphviz xmlns:cc="https://xmlcalabash.com/ns/configuration"
dot = string
style? = string
output? = string />
dot(filename)Location of the Graphviz “dot” executable.
style(filename)The XSL stylesheet that will be used to style pipeline and graph output.
output(directory)The default location for Graphviz output. Must be a directory.
cc:initializer
Loads an extension class.
<cc:initializer xmlns:cc="https://xmlcalabash.com/ns/configuration"
class = string
ignore-errors? = boolean />
XML Calabash will attempt to instantiate the named class and pass it to Saxon during initialization.
If ignore-errors is false, failing to load
the class will throw an exception.
cc:inline
Properties related to p:inline elements.
<cc:inline xmlns:cc="https://xmlcalabash.com/ns/configuration"
trim-whitespace = boolean />
trim-whitespace(boolean)It’s often convenient to use indentation in a pipeline document:
1 |
…|<p:with-input port="source">|<p:inline>|<document/>5 |</p:inline>|</p:with-input>|…But that introduces whitespace at the beginning and end of the inline document. As written, the document that is provided on the source port consists of: a newline, six spaces, the
<document/>element, a newline and four spaces. Sometimes that’s annoying. It’s possible to rewrite the example so that there’s no insignificant whitespace, but that makes the pipeline harder to read.If
trim-whitespaceistrue, leading and trailing whitespace inp:inlineelements is removed. This setting does not apply to implicit inlines because they never have leading or trailing whitespace.
cc:mimetype
Define additional filename extension to content type mappings.
<cc:mimetype xmlns:cc="https://xmlcalabash.com/ns/configuration"
content-type = string
extensions = string />
XML Calabash uses
javax.activation to lookup mime types. You can define
new types by creating an appropriately formatted
.mime.types file in your home directory. This will work for all
applications that read the .mime.types file.
Alternatively, you can define them in the configuration file.
content-type(MIME type)The content-type.
extensions(extension+)A space-separated list of filename extensions to associate with the content type.
For example, this entry:
|<cc:mimetype content-type="application/xml" extensions="xpl xproc"/>Will tell XML Calabash that filenames (or URIs, generally, in the absence of
server metadata) that end with
.xpl or .xproc should be interpreted as files with the
application/xml content type.
cc:namespace
Define additional namespace bindings for command line and configuration file option values.
<cc:namespace xmlns:cc="https://xmlcalabash.com/ns/configuration"
prefix = NCName
uri = string />
This option updates the set of namespaces that XML Calabash uses when evaluating options passed on the command line or in the configuration file. This includes both the option names and the expressions that define them.
This setting has no effect on the in-scope namespaces within a pipeline or within the documents accessed by the pipeline. This is exclusively about evaluating the names and values of options before the pipeline begins executing.
cc:paged-media
Select and configure paged media providers.
<cc:paged-media xmlns:cc="https://xmlcalabash.com/ns/configuration"
css-formatter? = string
xsl-formatter? = string
{any-name}* = string />
At least one of css-formatter or
xsl-formatter must be provided. The value of the
attribute should be the URI that identifies the processor that you want to select.
When searching for a CSS or XSL FO formatter, XML Calabash will try to instantiate
the processors in the order you specify them, selecting the first one that’s successfully
instantiated. To indicate that any acceptable processor can be used,
specify
https://xmlcalabash.com/paged-media/css-formatter for a CSS
processor, or
https://xmlcalabash.com/paged-media/xsl-formatter
for an XSL FO processor.
Any additional attribute/value pairs on the element are passed to the processor as configuration data. The accepted attributes and their valid values vary depending on the processor. No configuration properties are supported for the generic processors.
See
p:css-formatter
and
p:xsl-formatter
in the Reference Guide
for details.
cc:pipeline
Define a default pipeline.
<cc:pipeline xmlns:cc="https://xmlcalabash.com/ns/configuration"
href = anyURI
step? = NCName>
(cc:input |
cc:output |
cc:option)*
</cc:pipeline>
If no pipeline is supplied on the command line, the pipeline identified by the
href attribute will be run. If a step
is specified, then the pipeline must be a p:library.
If the step value is a string without a “:”,
it’s interpreted as a name and the p:declare-step with that name in the
library is selected; if the step value contains a “:”, then it
is interpreted as a QName (resolved against the combined namespace bindings
provided in the configuration file and on the command line) and the p:declare-step
with that type in the library is selected.
A pipeline can also provide default inputs, outputs, and options. Each of these applies only if an alternative input, output, or option is not specified on the command line.
Although the inputs, outputs, and options are nested inside the cc:pipeline
element, they are independent. The configured defaults apply even when an alternate
pipeline is specifed on the command line (unless alternative inputs, outputs, or options
are also specified there.)
cc:input
Define a set of default pipeline inputs.
<cc:input xmlns:cc="https://xmlcalabash.com/ns/configuration"
port = NCName
href? = anyURI
content-type? = string
encoding? = "base64">
(string |
any-name)?
</cc:input>
If href is specified, the document identified by that
URI will be loaded. The content-type and
encoding attributes are forbidden in this case.
Alternatively, the input may be provided inline. The
content-type and encoding attributes
determine how the content will be parsed. This is analagous to how p:inline
wroks.
The inline trim-whitespace flag does not apply to inputs in the configuration file.
cc:output
Define a set of default pipeline outputs.
<cc:output xmlns:cc="https://xmlcalabash.com/ns/configuration"
port = NCName
filespec = string />
The format of the filespec attribute
is discussed in the section about the --output option.
cc:option
Define a set of default pipeline options.
<cc:option xmlns:cc="https://xmlcalabash.com/ns/configuration"
name = EQName
value? = string
select? = string
content-type? = string
encoding? = "base64">
(string |
any-name)?
</cc:option>
Exactly one of value, select
or inline content must be provided. If value is specified, it
will be used as the (untyped atomic) value for the option.
If select is specified, it
will be treated as an XPath expression and evaluated to obtain the value. The expression
is evaluated without a context item.
If the value is provided inline. The content-type and encoding
attributes determine how the content will be parsed. This is analagous to how
p:inline wroks.
cc:proxy
Define proxy URIs for internet protocol requests.
<cc:proxy xmlns:cc="https://xmlcalabash.com/ns/configuration"
scheme = string
uri = anyURI />
scheme(protocol scheme)The protocol scheme.
uri(anyURI)The proxy URI.
If your network configuration requires the use of a proxy, you can
define them with cc:proxy. For example, this establishes that
requests for
http: URIs should use the http://localhost:8888/
proxy.
|<cc:proxy scheme="http" uri="http://localhost:8888"/>cc:saxon-configuration-property
Sets a Saxon configuration property.
<cc:saxon-configuration-property xmlns:cc="https://xmlcalabash.com/ns/configuration"
name = string
value = string />
name(property name)The Saxon configuration property name.
valueThe property value.
XML Calabash does not maintain a list of valid properties. Those are
defined by Saxon. Attempting to set a property that doesn’t
exist will throw an exception. Boolean valued properties must have the value
true or false.
cc:send-mail
Define properties for the p:send-mail step.
<cc:send-mail xmlns:cc="https://xmlcalabash.com/ns/configuration"
{any-name}* = string />
This element can be used to specify any properties for the
p:send-mail step. These values are the defaults, they will be overridden by
properties of the same name passed as parameters to the step.
Only attributes that are not in a namespace are passed through as properties. Any other attributes are ignored.
For backwards compatibility, the following four names are special:
host(string)The SMTP server host.
port(integer)The server port.
username(string)The user name, if login is required.
password(string)The password, if login is required.
In order to send mail, the p:send-mail step needs to know the location
of the SMTP server and login credentials, if they are required.
cc:serialization
Default serialization properties for particular content types.
<cc:serialization xmlns:cc="https://xmlcalabash.com/ns/configuration"
content-type = string
{any-name}* = string />
content-type(MIME type)The content type.
- any-name
Any attributes on the
cc:serializationelement other thancontent-typedefine the default serialization properties for documents with the corresponding content type.
For example, adding this to your configuration file:
|<cc:serialization content-type="text/html"
| method="html" html-version="5"/>Will serialize text/html documents using HTML 5 serialization by default.
The serialization properties on a document take precedence over these defaults.
cc:system-property
Set Java system properties before running a pipeline.
<cc:system-property xmlns:cc="https://xmlcalabash.com/ns/configuration"
name = string
value = string />
name(property name)The Java system property name.
valueThe property value.
Any properties specified in the configuration file will be set before the pipeline runs.
cc:message-reporter
Configure the message reporter. The only option is
buffer-size.
<cc:message-reporter xmlns:cc="https://xmlcalabash.com/ns/configuration"
buffer-size? = integer />
buffer-size(integer)Sets the number of messages buffered. These can be retrieved in a pipeline with
cx:pipeline-messages. If the value is zero or negative, there is no limit on the number of messages buffered. The default value is 32.
cc:visualizer
Control which visualizer is used and its options.
<cc:visualizer xmlns:cc="https://xmlcalabash.com/ns/configuration"
name = silent|plain|detail
{any-name}* = string />
The name must be specified. Additional
attributes provide options for the visualizer.
There are three options for the name:
silentSilent, no progress is reported.
plainPlain, the name of each step is reported when it begins running. Most steps manufactured automatically during graph construction are omitted. There is one option,
indentwhich determines whether or not, and to what extent, reports are indented when they are nested inside compound steps.detailDetailed, the start and end of each step is identified and the documents that they produce can also be identified.
If the
stepsoption istrue, the progress of steps is recorded. (Defaults to true.)If the
documentsoption istrue, the documents produced during execution are recorded. (Defaults to false.)
cc:threading
Control aspects of XML Calabash threading.
<cc:threading xmlns:cc="https://xmlcalabash.com/ns/configuration"
count? = integer />
count(integer)The size of the thread pool. If unspecified, the processor will use the maximum number of threads possible (one per processor). Specifying a value larger than the number of processors won’t increase the number of threads, at most one thread per processor will be allocated.
This configuration option enables threading, see Chapter 3, Running steps in parallel.
cc:xml-schema
Adds the specified XML Schema to the global validation context.
<cc:xml-schema xmlns:cc="https://xmlcalabash.com/ns/configuration"
href = anyURI />
href(URI)Location of the schema file.
cc:xquery-processor
Configures an XQuery processor.
<cc:xquery-processor xmlns:cc="https://xmlcalabash.com/ns/configuration"
name = anyURI
{any-name}* = string />
The name attribute must identify the processor.
The name of the default processor is https://saxonica.com/ (it has no configurable options). The
BaseX processor can be configured with the name https://basex.org/.
The eXist-db processor can be configured with the name
https://exist-db.org/.
Providing a cc:xquery-processor configuration does not change
the default XQuery processor. You must use
default-xquery-processor
as well if you wish to change the default.
Any attributes other than the name attribute are passed to the processor, but
may not pass all the way through to the back end processor (there’s no standard API for passing
arbitrary options to an arbitrary processor).
In principle, however, allows the configuration file to provide credentials for accessing a
remote server, for example. The special attribute
cc:fallback identifies the URI of the processor to use if
this processor is unavailable. If not specified, no fallback is provided and attempts
to use XQuery will fail if the nominated processor cannot be run.
Any other name
Configuration for extension steps can also be provided.
<{any-name}
{any-name}* = string />
Any element name can be used as long as it is not in the configuration namespace.