Appendix AConfiguration

XML Calabash can read a configuration file to establish some default settings. The configuration file is an XML document. All of the elements in the configuration file must be in the https://xmlcalabash.com/ns/configuration namespace. The conventional prefix for this namespace in the documentation is cc:.

Starting with XML Calabash 3.0.33, it is possible to set all of the run options in a configuration file. If a particular option is specified in the configuration file and on the command line, the command line value is used.

cc:xml-calabash

The document element of the configuration file is cc:xml-calabash:

<cc:xml-calabash xmlns:cc="https://xmlcalabash.com/ns/configuration"
  default-xquery-processor? = anyURI
  licensed? = boolean
  line-numbering? = boolean
  piped-io? = boolean
  saxon-configuration? = string
  stacktrace? = boolean
  try-namespaces? = boolean
  use-location-hints? = boolean
  validation-mode? = strict|lax
  verbosity? = trace|debug|info|warn|error
  version? = 1.0>
    (cc:catalog |
     cc:extension |
     cc:graphviz |
     cc:initializer |
     cc:inline |
     cc:mimetype |
     cc:namespace |
     cc:paged-media |
     cc:pipeline |
     cc:proxy |
     cc:saxon-configuration-property |
     cc:send-mail |
     cc:serialization |
     cc:system-property |
     cc:message-reporter |
     cc:visualizer |
     cc:threading |
     cc:xml-schema |
     cc:xquery-processor |
     any-name)*
</cc:xml-calabash>

default-xquery-processor (URI)

Identifies the default XQuery processor. If unspecified, the default is https://saxonica.com/, the Saxonica processor. Other processors can be specified, but they must also be implemented and added to the classpath.

licensed (boolean)

If true, a licensed Saxon configuration will be requested. In practice, a licensed processor is used by default, if one is available. However, setting this property to false will explicitly request an unlicensed processor when Saxon PE or Saxon EE are on the classpath.

This can also be specified on the command line. The command-line setting takes precedence.

Schema-aware processing requires Saxon EE and a valid Saxon license.

line-numbering (boolean)

If true, line numbers are preserved in parsed documents. Prior to version 3.0.33, XML Calabash always preserved line numbers, but that makes each tree model larger in memory and they are generally unused. There’s one common exception: validation steps that operate on the tree model. RELAX NG validation, for example, will only be able to show error locations if the initial tree model was constructed with line numbers preserved. This can be done selectively with the cx:line-numbers parameter on the p:load step (or p:document), for example:

  |<p:document href="some-input.xml"
  |            parameters="map{'cx:line-numbering':true()}"/>
Why is this different in XML Calabash?

If you’re familiar with using a validation tool from the command line, the line numbering distinction made here may be perplexiing. When you run a validator on the command line, the validator parses the document. Each element has a location and the validator gets those locations. When you parse a document with p:document or p:load, that’s also true. That’s why you always get error location information for a well-formedness error, those are caught by the parser.

But if you don’t enable line numbering, the location information is not preserved in the tree constructed by the parser. This reduces the amount of memory needed to store the tree and, except for a couple of extension functions, you can’t usually tell.

Except that when you pass a document to a validation step, the step starts with the tree. If the location information hasn’t been preserved in the tree, XML Calabash has no way to inform the validator about the locations of the elements.

Unless you have very large documents or a small amount of memory, it’s probably reasonable to just leave line numbering on by default.

piped-io (boolean)

If piped-io is true, XML Calabash will behave like a Unix pipeline. If there is no binding for the primary input port, it will read it from standard input. If there is no binding for the primary output port, it will write it to standard output. Irrespective of this setting, an explicit binding to standard input is possible with the --input option and an explicit binding to standard output is possible with --output option.

When standard input is read, it will be parsed as the (first) content type listed on the primary input port. It will be parsed as XML if the primary input port doesn’t specify any content types.

saxon-configuration (filename)

The filename of a Saxon configuration file. This file will be loaded to initialize the Saxon configuration.

try-namespaces (boolean)

If true, implicit validation will attempt to retrieve the schema using the namespace URI. This can also be specified on the command line. The command-line setting takes precedence.

use-location-hints (boolean)

If true, implicit validation will use location hints to locate schemas. This can also be specified on the command line. The command-line setting takes precedence.

validation-mode (“lax” or “strict”)

Specifies a validation mode for implicit validation. This can also be specified on the command line. The command-line setting takes precedence.

verbosity

The default “verbosity” setting. This can also be specified on the command line. The command-line setting takes precedence.

version (string)

The configuration file version, must be 1.0.

console-output-encoding

This attribute was removed in XML Calabash version 3.0.11. See Section 1.2, “The console encoding”.

For simplicity, the content model of cc:xml-calabash allows every element to occur an arbitrary number of times. Where an element defines a single, global setting, the last value in document order applies.

cc:catalog

Adds the specified catalog to the list of XML Catalogs used during resource resolution.

<cc:catalog xmlns:cc="https://xmlcalabash.com/ns/configuration"
  href? = anyURI />

href (URI)

Location of the catalog file.

cc:extension

Enables the named extension.

<cc:extension xmlns:cc="https://xmlcalabash.com/ns/configuration"
  name = string />

Only one extension name is recognized: eager-uri-resolution. See Eager URI resolution.

cc:graphviz

Identifies the location of the Graphviz executable. Making SVG diagrams of pipelines or graphs requires Graphviz.

<cc:graphviz xmlns:cc="https://xmlcalabash.com/ns/configuration"
  dot = string
  style? = string
  output? = string />

dot (filename)

Location of the Graphviz “dot” executable.

style (filename)

The XSL stylesheet that will be used to style pipeline and graph output.

output (directory)

The default location for Graphviz output. Must be a directory.

cc:initializer

Loads an extension class.

<cc:initializer xmlns:cc="https://xmlcalabash.com/ns/configuration"
  class = string
  ignore-errors? = boolean />

XML Calabash will attempt to instantiate the named class and pass it to Saxon during initialization.

If ignore-errors is false, failing to load the class will throw an exception.

cc:inline

Properties related to p:inline elements.

<cc:inline xmlns:cc="https://xmlcalabash.com/ns/configuration"
  trim-whitespace = boolean />

trim-whitespace (boolean)

It’s often convenient to use indentation in a pipeline document:

1 |
  |  <p:with-input port="source">
  |    <p:inline>
  |      <document/>
5 |    </p:inline>
  |  </p:with-input>
  |

But that introduces whitespace at the beginning and end of the inline document. As written, the document that is provided on the source port consists of: a newline, six spaces, the <document/> element, a newline and four spaces. Sometimes that’s annoying. It’s possible to rewrite the example so that there’s no insignificant whitespace, but that makes the pipeline harder to read.

If trim-whitespace is true, leading and trailing whitespace in p:inline elements is removed. This setting does not apply to implicit inlines because they never have leading or trailing whitespace.

cc:mimetype

Define additional filename extension to content type mappings.

<cc:mimetype xmlns:cc="https://xmlcalabash.com/ns/configuration"
  content-type = string
  extensions = string />

XML Calabash uses javax.activation to lookup mime types. You can define new types by creating an appropriately formatted .mime.types file in your home directory. This will work for all applications that read the .mime.types file.

Alternatively, you can define them in the configuration file.

content-type (MIME type)

The content-type.

extensions (extension+)

A space-separated list of filename extensions to associate with the content type.

For example, this entry:

  |<cc:mimetype content-type="application/xml" extensions="xpl xproc"/>

Will tell XML Calabash that filenames (or URIs, generally, in the absence of server metadata) that end with .xpl or .xproc should be interpreted as files with the application/xml content type.

cc:namespace

Define additional namespace bindings for command line and configuration file option values.

<cc:namespace xmlns:cc="https://xmlcalabash.com/ns/configuration"
  prefix = NCName
  uri = string />

This option updates the set of namespaces that XML Calabash uses when evaluating options passed on the command line or in the configuration file. This includes both the option names and the expressions that define them.

Important

This setting has no effect on the in-scope namespaces within a pipeline or within the documents accessed by the pipeline. This is exclusively about evaluating the names and values of options before the pipeline begins executing.

cc:paged-media

Select and configure paged media providers.

<cc:paged-media xmlns:cc="https://xmlcalabash.com/ns/configuration"
  css-formatter? = string
  xsl-formatter? = string
  {any-name}* = string />

At least one of css-formatter or xsl-formatter must be provided. The value of the attribute should be the URI that identifies the processor that you want to select.

When searching for a CSS or XSL FO formatter, XML Calabash will try to instantiate the processors in the order you specify them, selecting the first one that’s successfully instantiated. To indicate that any acceptable processor can be used, specify https://xmlcalabash.com/paged-media/css-formatter for a CSS processor, or https://xmlcalabash.com/paged-media/xsl-formatter for an XSL FO processor.

Any additional attribute/value pairs on the element are passed to the processor as configuration data. The accepted attributes and their valid values vary depending on the processor. No configuration properties are supported for the generic processors.

See p:css-formatter and p:xsl-formatter in the Reference Guide for details.

cc:pipeline

Define a default pipeline.

<cc:pipeline xmlns:cc="https://xmlcalabash.com/ns/configuration"
  href = anyURI
  step? = NCName>
    (cc:input |
     cc:output |
     cc:option)*
</cc:pipeline>

If no pipeline is supplied on the command line, the pipeline identified by the href attribute will be run. If a step is specified, then the pipeline must be a p:library. If the step value is a string without a “:”, it’s interpreted as a name and the p:declare-step with that name in the library is selected; if the step value contains a “:”, then it is interpreted as a QName (resolved against the combined namespace bindings provided in the configuration file and on the command line) and the p:declare-step with that type in the library is selected.

A pipeline can also provide default inputs, outputs, and options. Each of these applies only if an alternative input, output, or option is not specified on the command line.

Although the inputs, outputs, and options are nested inside the cc:pipeline element, they are independent. The configured defaults apply even when an alternate pipeline is specifed on the command line (unless alternative inputs, outputs, or options are also specified there.)

cc:input

Define a set of default pipeline inputs.

<cc:input xmlns:cc="https://xmlcalabash.com/ns/configuration"
  port = NCName
  href? = anyURI
  content-type? = string
  encoding? = "base64">
    (string |
     any-name)?
</cc:input>

If href is specified, the document identified by that URI will be loaded. The content-type and encoding attributes are forbidden in this case.

Alternatively, the input may be provided inline. The content-type and encoding attributes determine how the content will be parsed. This is analagous to how p:inline wroks.

Note

The inline trim-whitespace flag does not apply to inputs in the configuration file.

cc:output

Define a set of default pipeline outputs.

<cc:output xmlns:cc="https://xmlcalabash.com/ns/configuration"
  port = NCName
  filespec = string />

The format of the filespec attribute is discussed in the section about the --output option.

cc:option

Define a set of default pipeline options.

<cc:option xmlns:cc="https://xmlcalabash.com/ns/configuration"
  name = EQName
  value? = string
  select? = string
  content-type? = string
  encoding? = "base64">
    (string |
     any-name)?
</cc:option>

Exactly one of value, select or inline content must be provided. If value is specified, it will be used as the (untyped atomic) value for the option. If select is specified, it will be treated as an XPath expression and evaluated to obtain the value. The expression is evaluated without a context item.

If the value is provided inline. The content-type and encoding attributes determine how the content will be parsed. This is analagous to how p:inline wroks.

cc:proxy

Define proxy URIs for internet protocol requests.

<cc:proxy xmlns:cc="https://xmlcalabash.com/ns/configuration"
  scheme = string
  uri = anyURI />

scheme (protocol scheme)

The protocol scheme.

uri (anyURI)

The proxy URI.

If your network configuration requires the use of a proxy, you can define them with cc:proxy. For example, this establishes that requests for http: URIs should use the http://localhost:8888/ proxy.

  |<cc:proxy scheme="http" uri="http://localhost:8888"/>

cc:saxon-configuration-property

Sets a Saxon configuration property.

<cc:saxon-configuration-property xmlns:cc="https://xmlcalabash.com/ns/configuration"
  name = string
  value = string />

name (property name)

The Saxon configuration property name.

value

The property value.

XML Calabash does not maintain a list of valid properties. Those are defined by Saxon. Attempting to set a property that doesn’t exist will throw an exception. Boolean valued properties must have the value true or false.

cc:send-mail

Define properties for the p:send-mail step.

<cc:send-mail xmlns:cc="https://xmlcalabash.com/ns/configuration"
  {any-name}* = string />

This element can be used to specify any properties for the p:send-mail step. These values are the defaults, they will be overridden by properties of the same name passed as parameters to the step.

Only attributes that are not in a namespace are passed through as properties. Any other attributes are ignored.

For backwards compatibility, the following four names are special:

host (string)

The SMTP server host.

port (integer)

The server port.

username (string)

The user name, if login is required.

password (string)

The password, if login is required.

In order to send mail, the p:send-mail step needs to know the location of the SMTP server and login credentials, if they are required.

cc:serialization

Default serialization properties for particular content types.

<cc:serialization xmlns:cc="https://xmlcalabash.com/ns/configuration"
  content-type = string
  {any-name}* = string />

content-type (MIME type)

The content type.

any-name

Any attributes on the cc:serialization element other than content-type define the default serialization properties for documents with the corresponding content type.

For example, adding this to your configuration file:

  |<cc:serialization content-type="text/html"
  |                  method="html" html-version="5"/>

Will serialize text/html documents using HTML 5 serialization by default. The serialization properties on a document take precedence over these defaults.

cc:system-property

Set Java system properties before running a pipeline.

<cc:system-property xmlns:cc="https://xmlcalabash.com/ns/configuration"
  name = string
  value = string />

name (property name)

The Java system property name.

value

The property value.

Any properties specified in the configuration file will be set before the pipeline runs.

cc:message-reporter

Configure the message reporter. The only option is buffer-size.

<cc:message-reporter xmlns:cc="https://xmlcalabash.com/ns/configuration"
  buffer-size? = integer />

buffer-size (integer)

Sets the number of messages buffered. These can be retrieved in a pipeline with cx:pipeline-messages. If the value is zero or negative, there is no limit on the number of messages buffered. The default value is 32.

cc:visualizer

Control which visualizer is used and its options.

<cc:visualizer xmlns:cc="https://xmlcalabash.com/ns/configuration"
  name = silent|plain|detail
  {any-name}* = string />

The name must be specified. Additional attributes provide options for the visualizer.

There are three options for the name:

silent

Silent, no progress is reported.

plain

Plain, the name of each step is reported when it begins running. Most steps manufactured automatically during graph construction are omitted. There is one option, indent which determines whether or not, and to what extent, reports are indented when they are nested inside compound steps.

detail

Detailed, the start and end of each step is identified and the documents that they produce can also be identified.

If the steps option is true, the progress of steps is recorded. (Defaults to true.)

If the documents option is true, the documents produced during execution are recorded. (Defaults to false.)

cc:threading

Control aspects of XML Calabash threading.

<cc:threading xmlns:cc="https://xmlcalabash.com/ns/configuration"
  count? = integer />

count (integer)

The size of the thread pool. If unspecified, the processor will use the maximum number of threads possible (one per processor). Specifying a value larger than the number of processors won’t increase the number of threads, at most one thread per processor will be allocated.

This configuration option enables threading, see Chapter 3, Running steps in parallel.

cc:xml-schema

Adds the specified XML Schema to the global validation context.

<cc:xml-schema xmlns:cc="https://xmlcalabash.com/ns/configuration"
  href = anyURI />

href (URI)

Location of the schema file.

cc:xquery-processor

Configures an XQuery processor.

<cc:xquery-processor xmlns:cc="https://xmlcalabash.com/ns/configuration"
  name = anyURI
  {any-name}* = string />

The name attribute must identify the processor. The name of the default processor is https://saxonica.com/ (it has no configurable options). The BaseX processor can be configured with the name https://basex.org/. The eXist-db processor can be configured with the name https://exist-db.org/.

Note

Providing a cc:xquery-processor configuration does not change the default XQuery processor. You must use default-xquery-processor as well if you wish to change the default.

Any attributes other than the name attribute are passed to the processor, but may not pass all the way through to the back end processor (there’s no standard API for passing arbitrary options to an arbitrary processor). In principle, however, allows the configuration file to provide credentials for accessing a remote server, for example. The special attribute cc:fallback identifies the URI of the processor to use if this processor is unavailable. If not specified, no fallback is provided and attempts to use XQuery will fail if the nominated processor cannot be run.

Any other name

Configuration for extension steps can also be provided.

<{any-name}
  {any-name}* = string />

Any element name can be used as long as it is not in the configuration namespace.