Chapter 5. Pipelines vs. Graphs

XML Calabash transforms a pipeline into a collection of graphs. It is the graphs that are evaluated at runtime.

To explore the difference between pipelines and graphs, consider the example pipeline in Figure 5.1, “Sample debugging pipeline”.

Unnecessary complexity

The example pipeline in this chapter is straightfoward, but unnecessarily complicated in order to highlight some of the differences between pipelines and graphs. An equivalent and more concise version is:

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |                version="3.0">
   |<p:output port="result" sequence="true"/>
 5 | 
   |<p:xinclude>
   |  <p:with-input href="../xml/default-input.xml"/>
   |</p:xinclude>
   | 
10 |<p:add-attribute name="add" match="/*/*"
   |                 attribute-name="role" attribute-value="test"/>
   | 
   |</p:declare-step>

But that’s not as interesting for this discussion.

The pipeline declares two steps and uses one additional compound step.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:ex="https://xmlcalabash.com/ns/examples"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |                version="3.0">
 5 |<p:output port="result" sequence="true"/>
   | 
   |<p:declare-step type="ex:ident">
   |  <p:input port="source" sequence="true"/>
   |  <p:output port="result" sequence="true"/>
10 |  <p:identity/>
   |</p:declare-step>
   | 
   |<p:xinclude>
   |  <p:with-input href="../xml/default-input.xml"/>
15 |</p:xinclude>
   | 
   |<ex:ident name="ex1"/>
   | 
   |<p:for-each>
20 |  <p:with-input select="/ex:doc/*"/>
   |  <p:identity name="id1"/>
   |  <p:add-attribute name="add"
   |                   attribute-name="role" attribute-value="test"/>
   |</p:for-each>
25 | 
   |<p:wrap-sequence expand-text="false"
   |                 wrapper="Q{https://xmlcalabash.com/ns/examples}set"/>
   | 
   |<p:identity name="id2"/>
30 | 
   |</p:declare-step>

Figure 5.1. Sample debugging pipeline

The processor analyzes the pipeline (resolves use-when attributes, default readable ports, etc.) and constructs two models, one for each pipeline, like the ones shown in 5.2. The pipeline models.

You can generate the graphs for any pipeline with the --graphs command line option.

Figure 5.2. The pipeline models

Subfigure 5.2.1. The pipeline model

Subfigure 5.2.2. The ex:ident model

The processor makes some changes in the pipeline. Some occurrences of p:document, p:inline, and even p:option and p:variable, are “promoted” to steps. This would be burdensome for authoring, but simplifies the implementation.

The processor then analyzes the pipeline to construct a set of graphs. It’s the actual graphs that gets executed. The graph for this pipeline is shown in 5.3. The graph models.

Figure 5.3. The graph models

Subfigure 5.3.1. The pipeline graph

Subfigure 5.3.2. The ex:ident graph

Compound steps are represented as separate graphs. They have also grown “head” and “foot” steps that manage the boundaries.

Styling the SVG

The actual SVG diagrams are constructed in several steps. If you run with the --graphs and --debug options, all of the intermediate files will be saved in the graph output directory. The starting point is the pipeline description itself, pipeline.xml. This document contains an XML description of the compiled pipelines and their graphs. The pipeline description vocabulary is described in Section 5.2, “Description reference”.

The graphs are ultimately constructed with Graphviz. The pipeline.xml document is styled, then converted to a number of “.dot” files, each processed with Graphviz.

The default styling produces results like the one’s you’ve already seen.

In Graphviz terms, each step is a node, those are the nested rectangles, inside each node is a table, and edges join them. In particular, the first and last row of each table contain the inputs and outputs for the step, the “ports” in Graphviz terminology.

The first 50 or so lines of a pipeline description is shown in Figure 5.4, “The description markup for the ex:ident graph”.

 1 |<g:description xmlns:g="http://xmlcalabash.com/ns/description">
   |   <g:pipeline-container xmlns="http://www.w3.org/1999/xhtml"
   |                         xmlns:dot="http://xmlcalabash.com/ns/dot"
   |                         xmlns:h="http://www.w3.org/1999/xhtml">
 5 |      <g:declare-step xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                      xmlns:ex="https://xmlcalabash.com/ns/examples"
   |                      xmlns:p="http://www.w3.org/ns/xproc"
   |                      xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |                      name="!declare-step"
10 |                      base-uri="file:/Volumes/Projects/xproc/xmlcalabash3/documentation/src/examples/xpl/debugger.xpl"
   |                      id="IC475"
   |                      filename="p_declare-step"
   |                      version="3.0"
   |                      dot:label="p:declare-step">
15 |         <g:input dot:peripheries="0"
   |                  dot:shape="house"
   |                  h:cellspacing="0"
   |                  h:border="0"
   |                  h:cellborder="1">
20 |            <g:port id="output_5" primary="true" sequence="true">result</g:port>
   |         </g:input>
   |         <g:head dot:shape="parallelogram"
   |                 dot:peripheries="0"
   |                 h:cellspacing="0"
25 |                 h:border="0"
   |                 h:cellborder="1">
   |            <g:inputs/>
   |            <g:outputs/>
   |         </g:head>
30 |         <g:atomic-step dot:peripheries="0"
   |                        h:cellspacing="0"
   |                        h:border="0"
   |                        h:cellborder="1"
   |                        name="!document"
35 |                        type="cx:document"
   |                        href="../xml/default-input.xml">
   |            <g:inputs/>
   |            <g:detail>
   |               <td>cx:document</td>
40 |            </g:detail>
   |            <g:detail>
   |               <td>href="../xml/default-input.xml"</td>
   |            </g:detail>
   |            <g:outputs>
45 |               <g:port id="_7" primary="true" sequence="false">result</g:port>
   |            </g:outputs>
   |         </g:atomic-step>
   |         <g:atomic-step dot:peripheries="0"
   |                        h:cellspacing="0"

Figure 5.4. The description markup for the ex:ident graph

Default styling has adding attributes to this markup and “rows” in the atomic steps. Each g:detail element represents a row in the table and should contain one or more td elements.

⚠

Caution

Styling can also change the structure of the graph, but that risks creating a graph that doesn’t accurately reflect the structure of the original pipeline which may be very misleading.

Attributes in the “dot” namespace (http://xmlcalabash.com/ns/dot) can be added to supply Graphviz attributes. Attributes in the “html” namespace can be added to supply the HTML styling attributes that Graphviz supports. You should use unqualified attribute names on the “HTML” table elements that appear in the output. The pipeline description vocabulary enumerates the HTML attributes that (the Graphviz documentation says that) Graphviz understands, but that’s documentary. The styled document is not validated.

Default styling

If you don’t specify a stylesheet, the pipelines and graphs are styled with a default stylesheet. The default stylesheet:

Labels each step with its type and name (if a name was provided).
Puts pipeline inputs and outputs in “house” shapes with fewer borders.
Gives the head and foot elements a paralellogram shape with fewer table borders.
Labels variable and option atomic steps appropriately and adds the variable name as a detail. If a default expression is provided, that’s also added as a detail.
Adds links between subpipelines and their graphs.
Uses dashed lines for “depends” edges.
Adds “⋮” as a label to edges where the output allows a sequence but the input does not.

Despite the earlier admonition about changing the graph structure, the default style does make one structural change. Any output port that is unconnected is connected to a “sink” represented as a point.

Custom styling

Providing your own stylesheet allows you to achieve any custom styling that you’d like. Your stylesheet is applied after the standard stylesheet. This gives you the opportunity to add or remove g:detail elements, or to change the styling of elements.

Your stylesheet is applied to each pipeline and graph separately; this makes it much easier to manage links between elements. You’ll never accidentally match a name or ID from a different pipeline or graph. The same custom stylesheet is applied to both pipelines and graphs; you can distinguish between them by the root element which will be g:pipeline-container for pipelines and g:graph-container for graphs.

Background colors applied to the container elements apply to the entire “canvas”; background colors on nested elements apply to smaller regions. Note that if the graph style of an element is “invis” (as it is on graph elements by default), the background color is ignored.

The stylesheet in Figure 5.5, “Custom pipeline and graph styling” colors primary inputs and outputs in pale blue and makes implicit connections gray rather than black. (The “implicitness” of connections is roughly analogous to when the connection was made automatically to the default readable port, but it should be taken with a grain of salt; compiling a pipeline involves adding, removing, and changing connections for a number of reasons and it’s not always clear which ones are implicit and which ones aren’t.)

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   |                xmlns:dot="http://xmlcalabash.com/ns/dot"
   |                xmlns:g="http://xmlcalabash.com/ns/description"
 5 |                xmlns:h="http://www.w3.org/1999/xhtml"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |                exclude-result-prefixes="xs"
   |                version="3.0">
   |<xsl:output method="xml" encoding="utf-8" indent="no"/>
10 | 
   |<xsl:mode on-no-match="shallow-copy"/>
   | 
   |<xsl:template match="g:port[@primary='true']">
   |  <xsl:copy>
15 |    <xsl:apply-templates select="@*"/>
   |    <xsl:attribute name="h:bgcolor" select="'#ccccff'"/>
   |    <xsl:apply-templates select="node()"/>
   |  </xsl:copy>
   |</xsl:template>
20 | 
   |<xsl:template match="g:edge[@implicit = 'true']">
   |  <xsl:copy>
   |    <xsl:apply-templates select="@*"/>
   |    <xsl:attribute name="dot:color" select="'#8a8a8a'"/>
25 |  </xsl:copy>
   |</xsl:template>
   | 
   |</xsl:stylesheet>

Figure 5.5. Custom pipeline and graph styling

The result of applying custom styling is shown in Figure 5.6, “The ex:ident graph with custom styling”.

Figure 5.6. The ex:ident graph with custom styling

Description reference

The XML vocabulary used to describe and style pipelines and graphs is a bit ad hoc. It started out as an internal format without any serious documentation. The observation that the graphs are a useful teaching aid persuaded me to try to make it a little more user friendly.

The pipeline description and the graph description have been made as similar as possible so that they can be styled more easily. The description schema doesn’t attempt to be prescriptive; in practice some combinations of attributes don’t arise, even though they’d be allowed by the schema. For example, a cx:document atomic step may have an href attribute, but no other step type will, nor will a cx:document ever have an expression.

The format leans heavily on the features of Graphviz. Familiarity with the Graphviz “dot” model and the way that nodes can be formatted with HTML table markup will probably be an aid to comprehension.

It’s perhaps a little easier to understand the vocabulary starting from the center and working out. Most of the interesting markup is in atomic steps. An atomic step has inputs and outputs, those become table cells in the first and last rows of the description, respectively. In between the inputs and outputs, additional rows can be inserted with g:detail elements that also contain table cells. The atomic step container becomes a table. The table is wrapped in a node.

Compound steps have inputs and outputs and contain one or more atomic or compound steps.

Edges connect inputs to outputs. Unbound output ports are connected to “sinks” that appear as simple dots in the graph, rather than boxes.

☞

Tip

In the descriptions that follow, “dot” and HTML attributes, when they’re allowed, are abbreviated “dot:*” and “h:*” to simplify the summary. The schema actually contains longer enumerations, mostly because of an apparent bug in Trang.

g:description

<g:description xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> (g:pipeline-container | g:graph-container)+ </g:description>

g:declare-step

<g:declare-step xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string name? = string type? = EQName base-uri? = anyURI id = string filename? = string version? = string> g:input*, g:output*, g:head, (g:atomic-step | g:compound-step | g:subpipeline)+, g:foot, g:edge* </g:declare-step>

If the filename is present, it’s the base filename used for this pipeline.

g:input

<g:input xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:port </g:input>

g:output

<g:output xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:port </g:output>

g:detail

<g:detail xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> h:td+ </g:detail>

g:head

<g:head xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:inputs, g:outputs </g:head>

g:foot

<g:foot xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:inputs, g:outputs </g:foot>

g:inputs

<g:inputs xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:port* </g:inputs>

g:outputs

<g:outputs xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:port* </g:outputs>

g:port

<g:port xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string id = string primary? = boolean sequence? = boolean> string </g:port>

g:atomic-step

<g:atomic-step xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string id? = string name = string type = EQName as? = string href? = string ref? = string select? = string option-name? = EQName variable-name? = EQName expression? = string filename? = string> g:inputs, g:detail*, g:outputs </g:atomic-step>

If the filename is present, this is a user-defined step and the filename identifies the declaration for this step.

g:subpipeline

<g:subpipeline xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string id = string name = string type = EQName ref = string> g:inputs, g:detail*, g:outputs </g:subpipeline>

A subpipeline is an atomic step that “calls” a compound step.

g:compound-step

<g:compound-step xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string id? = string name = string type = EQName> g:head, (g:atomic-step | g:compound-step | g:subpipeline)+, g:foot, g:edge* </g:compound-step>

g:edge

<g:edge xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string from = string to = string from-step? = string from-port? = string to-step? = string to-port? = string implicit? = boolean />

The only significant attributes on the g:edge elements are from and to which must point to id values on port elements. The other edge attributes are provided as a convenience for deciding how they should be styled.

g:pipeline-container

<g:pipeline-container xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:declare-step </g:pipeline-container>

g:graph-container

<g:graph-container xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string> g:graph </g:graph-container>

g:graph

<g:graph xmlns:g="http://xmlcalabash.com/ns/description" dot:* = string h:* = string filename? = string> g:input*, g:output*, g:declare-step, (g:atomic-step | g:compound-step | g:subpipeline)*, g:edge* </g:graph>

If the filename is present, it’s the base filename used for this graph.

td

ⓘ

Note

The HTML elements in this summary are explicitly in the HTML namespace. GraphViz doesn’t really understand XML; you must not specify the namespace declaration on these elements, nor may you use an explicit prefix.

The attributes in the description are the attributes that the Graphviz documentation claims are supported.

font

 (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:font>

br

 

img

<img xmlns="http://www.w3.org/1999/xhtml" scale? = string src = string />

i

 (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:i>

b

 (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:b>

o

<o xmlns="http://www.w3.org/1999/xhtml"> (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:o>

sub

 (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:sub>

sup

 (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:sup>

s

<s xmlns="http://www.w3.org/1999/xhtml"> (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:s>

hr

<hr xmlns="http://www.w3.org/1999/xhtml"> (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:hr>

vr

<vr xmlns="http://www.w3.org/1999/xhtml"> (string | h:font | h:br | h:img | h:i | h:b | h:o | h:sub | h:sup | h:s | h:hr | h:vr)* </h:vr>

Prev	Up	Next
Chapter 4. Implicit validation	Home	Chapter 6. Messages and logging