Chapter 5. Pipelines vs. Graphs
XML Calabash transforms a pipeline into a collection of graphs. It is the graphs that are evaluated at runtime.
To explore the difference between pipelines and graphs, consider the example pipeline in Figure 5.1, “Sample debugging pipeline”.
The pipeline declares two steps and uses one additional compound step.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:ex="https://xmlcalabash.com/ns/examples"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| version="3.0">
5 |<p:output port="result" sequence="true"/>
|
|<p:declare-step type="ex:ident">
| <p:input port="source" sequence="true"/>
| <p:output port="result" sequence="true"/>
10 | <p:identity/>
|</p:declare-step>
|
|<p:xinclude>
| <p:with-input href="../xml/default-input.xml"/>
15 |</p:xinclude>
|
|<ex:ident name="ex1"/>
|
|<p:for-each>
20 | <p:with-input select="/ex:doc/*"/>
| <p:identity name="id1"/>
| <p:add-attribute name="add"
| attribute-name="role" attribute-value="test"/>
|</p:for-each>
25 |
|<p:wrap-sequence expand-text="false"
| wrapper="Q{https://xmlcalabash.com/ns/examples}set"/>
|
|<p:identity name="id2"/>
30 |
|</p:declare-step>
The processor analyzes the pipeline (resolves use-when
attributes, default readable ports, etc.) and constructs two models, one for each pipeline, like the
ones shown in 5.2. The pipeline models.
You can generate the graphs for any pipeline with the
--graphs
command line option.
The processor makes some changes in the pipeline. Some occurrences of p:document
,
p:inline
, and even p:option
and p:variable
, are “promoted”
to steps. This would be burdensome for authoring, but simplifies the implementation.
The processor then analyzes the pipeline to construct a set of graphs. It’s the actual graphs that gets executed. The graph for this pipeline is shown in 5.3. The graph models.
Compound steps are represented as separate graphs. They have also grown “head” and “foot” steps that manage the boundaries.
Styling the SVG
The actual SVG diagrams are constructed in several steps. If you run with
the
--graphs
and
--debug
options,
all of the intermediate files will be saved in the graph output directory.
The starting point is the pipeline description itself, pipeline.xml
.
This document contains an XML description of the compiled pipelines and their graphs.
The pipeline description vocabulary is described in Section 5.2, “Description reference”.
The graphs are ultimately constructed with Graphviz. The pipeline.xml
document is styled, then converted to a number of “.dot
” files,
each processed with Graphviz.
The default styling produces results like the one’s you’ve already seen.
In Graphviz terms, each step is a node, those are the nested rectangles, inside each node is a table, and edges join them. In particular, the first and last row of each table contain the inputs and outputs for the step, the “ports” in Graphviz terminology.
The first 50 or so lines of a pipeline description is shown in Figure 5.4, “The description markup for the ex:ident graph”.
1 |<g:description xmlns:g="http://xmlcalabash.com/ns/description">
| <g:pipeline-container xmlns="http://www.w3.org/1999/xhtml"
| xmlns:dot="http://xmlcalabash.com/ns/dot"
| xmlns:h="http://www.w3.org/1999/xhtml">
5 | <g:declare-step xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:ex="https://xmlcalabash.com/ns/examples"
| xmlns:p="http://www.w3.org/ns/xproc"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| name="!declare-step"
10 | base-uri="file:/Volumes/Projects/xproc/xmlcalabash3/documentation/src/examples/xpl/debugger.xpl"
| id="IC475"
| filename="p_declare-step"
| version="3.0"
| dot:label="p:declare-step">
15 | <g:input dot:peripheries="0"
| dot:shape="house"
| h:cellspacing="0"
| h:border="0"
| h:cellborder="1">
20 | <g:port id="output_5" primary="true" sequence="true">result</g:port>
| </g:input>
| <g:head dot:shape="parallelogram"
| dot:peripheries="0"
| h:cellspacing="0"
25 | h:border="0"
| h:cellborder="1">
| <g:inputs/>
| <g:outputs/>
| </g:head>
30 | <g:atomic-step dot:peripheries="0"
| h:cellspacing="0"
| h:border="0"
| h:cellborder="1"
| name="!document"
35 | type="cx:document"
| href="../xml/default-input.xml">
| <g:inputs/>
| <g:detail>
| <td>cx:document</td>
40 | </g:detail>
| <g:detail>
| <td>href="../xml/default-input.xml"</td>
| </g:detail>
| <g:outputs>
45 | <g:port id="_7" primary="true" sequence="false">result</g:port>
| </g:outputs>
| </g:atomic-step>
| <g:atomic-step dot:peripheries="0"
| h:cellspacing="0"
Default styling has adding attributes to this markup and “rows” in the
atomic steps. Each g:detail
element represents a row in the table
and should contain one or more td
elements.
Styling can also change the structure of the graph, but that risks creating a graph that doesn’t accurately reflect the structure of the original pipeline which may be very misleading.
Attributes in the “dot” namespace
(http://xmlcalabash.com/ns/dot
) can be added to supply Graphviz
attributes. Attributes in the “html” namespace can be added to supply the HTML
styling attributes that Graphviz supports. You should use unqualified attribute names
on the “HTML” table elements that appear in the output.
The pipeline description vocabulary enumerates
the HTML attributes that (the Graphviz documentation says that) Graphviz understands,
but that’s documentary. The styled document is not validated.
Default styling
If you don’t specify a stylesheet, the pipelines and graphs are styled with a default stylesheet. The default stylesheet:
Labels each step with its type and name (if a name was provided).
Puts pipeline inputs and outputs in “house” shapes with fewer borders.
Gives the head and foot elements a paralellogram shape with fewer table borders.
Labels variable and option atomic steps appropriately and adds the variable name as a detail. If a default expression is provided, that’s also added as a detail.
Adds links between subpipelines and their graphs.
Uses dashed lines for “depends” edges.
Adds “⋮” as a label to edges where the output allows a sequence but the input does not.
Despite the earlier admonition about changing the graph structure, the default style does make one structural change. Any output port that is unconnected is connected to a “sink” represented as a point.
Custom styling
Providing your own stylesheet allows you to achieve any custom styling
that you’d like. Your stylesheet is applied after the standard stylesheet. This gives
you the opportunity to add or remove g:detail
elements, or to change
the styling of elements.
Your stylesheet is applied to each pipeline and graph separately; this
makes it much easier to manage links between elements. You’ll never accidentally match a
name or ID from a different pipeline or graph. The same custom stylesheet is applied
to both pipelines and graphs; you can distinguish between them by the root element
which will be g:pipeline-container
for pipelines and g:graph-container
for graphs.
Background colors applied to the container elements apply to the entire “canvas”; background colors on nested elements apply to smaller regions. Note that if the graph style of an element is “invis” (as it is on graph elements by default), the background color is ignored.
The stylesheet in Figure 5.5, “Custom pipeline and graph styling” colors primary inputs and outputs in pale blue and makes implicit connections gray rather than black. (The “implicitness” of connections is roughly analogous to when the connection was made automatically to the default readable port, but it should be taken with a grain of salt; compiling a pipeline involves adding, removing, and changing connections for a number of reasons and it’s not always clear which ones are implicit and which ones aren’t.)
1 |<?xml version="1.0" encoding="utf-8"?>
|<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
| xmlns:dot="http://xmlcalabash.com/ns/dot"
| xmlns:g="http://xmlcalabash.com/ns/description"
5 | xmlns:h="http://www.w3.org/1999/xhtml"
| xmlns:xs="http://www.w3.org/2001/XMLSchema"
| exclude-result-prefixes="xs"
| version="3.0">
|<xsl:output method="xml" encoding="utf-8" indent="no"/>
10 |
|<xsl:mode on-no-match="shallow-copy"/>
|
|<xsl:template match="g:port[@primary='true']">
| <xsl:copy>
15 | <xsl:apply-templates select="@*"/>
| <xsl:attribute name="h:bgcolor" select="'#ccccff'"/>
| <xsl:apply-templates select="node()"/>
| </xsl:copy>
|</xsl:template>
20 |
|<xsl:template match="g:edge[@implicit = 'true']">
| <xsl:copy>
| <xsl:apply-templates select="@*"/>
| <xsl:attribute name="dot:color" select="'#8a8a8a'"/>
25 | </xsl:copy>
|</xsl:template>
|
|</xsl:stylesheet>
The result of applying custom styling is shown in Figure 5.6, “The ex:ident graph with custom styling”.
Description reference
The XML vocabulary used to describe and style pipelines and graphs is a bit ad hoc. It started out as an internal format without any serious documentation. The observation that the graphs are a useful teaching aid persuaded me to try to make it a little more user friendly.
The pipeline description and the graph description have been made as
similar as possible so that they can be styled more easily. The description
schema doesn’t attempt to be prescriptive; in practice some combinations of
attributes don’t arise, even though they’d be allowed by the schema.
For example, a cx:document
atomic step may have an
href
attribute, but no other step type will, nor will
a cx:document
ever have an expression
.
The format leans heavily on the features of Graphviz. Familiarity with the Graphviz “dot” model and the way that nodes can be formatted with HTML table markup will probably be an aid to comprehension.
It’s perhaps a little easier to understand the vocabulary starting from
the center and working out. Most of the interesting markup is in atomic steps.
An atomic step has inputs and outputs, those become table cells in the first and
last rows of the description, respectively. In between the inputs and outputs,
additional rows can be inserted with g:detail
elements that also
contain table cells. The atomic step container becomes a table. The table is
wrapped in a node.
Compound steps have inputs and outputs and contain one or more atomic or compound steps.
Edges connect inputs to outputs. Unbound output ports are connected to “sinks” that appear as simple dots in the graph, rather than boxes.
In the descriptions that follow, “dot” and HTML attributes, when they’re
allowed, are abbreviated “dot:*
” and “h:*
” to
simplify the summary. The schema actually contains longer enumerations, mostly
because of an
apparent bug in Trang.
g:description
<g:description xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
(g:pipeline-container |
g:graph-container)+
</g:description>
g:declare-step
<g:declare-step xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string
name? = string
type? = EQName
base-uri? = anyURI
id = string
filename? = string
version? = string>
g:input*,
g:output*,
g:head,
(g:atomic-step |
g:compound-step |
g:subpipeline)+,
g:foot,
g:edge*
</g:declare-step>
If the filename
is present, it’s the base filename
used for this pipeline.
g:input
<g:input xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:port
</g:input>
g:output
<g:output xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:port
</g:output>
g:detail
<g:detail xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
h:td+
</g:detail>
g:head
<g:head xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:inputs,
g:outputs
</g:head>
g:foot
<g:foot xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:inputs,
g:outputs
</g:foot>
g:inputs
<g:inputs xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:port*
</g:inputs>
g:outputs
<g:outputs xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:port*
</g:outputs>
g:port
<g:port xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string
id = string
primary? = boolean
sequence? = boolean>
string
</g:port>
g:atomic-step
<g:atomic-step xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string
id? = string
name = string
type = EQName
as? = string
href? = string
ref? = string
select? = string
option-name? = EQName
variable-name? = EQName
expression? = string
filename? = string>
g:inputs,
g:detail*,
g:outputs
</g:atomic-step>
If the filename
is present, this is a user-defined
step and the filename identifies the declaration for this step.
g:subpipeline
<g:subpipeline xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string
id = string
name = string
type = EQName
ref = string>
g:inputs,
g:detail*,
g:outputs
</g:subpipeline>
A subpipeline is an atomic step that “calls” a compound step.
g:compound-step
<g:compound-step xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string
id? = string
name = string
type = EQName>
g:head,
(g:atomic-step |
g:compound-step |
g:subpipeline)+,
g:foot,
g:edge*
</g:compound-step>
g:edge
<g:edge xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
from = string
to = string
from-step? = string
from-port? = string
to-step? = string
to-port? = string
implicit? = boolean />
The only significant attributes on the g:edge
elements are from
and to
which must
point to id values on port elements. The other edge attributes are provided
as a convenience for deciding how they should be styled.
g:pipeline-container
<g:pipeline-container xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:declare-step
</g:pipeline-container>
g:graph-container
<g:graph-container xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string>
g:graph
</g:graph-container>
g:graph
<g:graph xmlns:g="http://xmlcalabash.com/ns/description"
dot:* = string
h:* = string
filename? = string>
g:input*,
g:output*,
g:declare-step,
(g:atomic-step |
g:compound-step |
g:subpipeline)*,
g:edge*
</g:graph>
If the filename
is present, it’s the base filename
used for this graph.
td
The HTML elements in this summary are explicitly in the HTML namespace. GraphViz doesn’t really understand XML; you must not specify the namespace declaration on these elements, nor may you use an explicit prefix.
The attributes in the description are the attributes that the Graphviz documentation claims are supported.
<td xmlns="http://www.w3.org/1999/xhtml"
align? = center|left|right|text
balign? = center|left|right
valign? = middle|bottom|top
bgcolor? = string
border? = string
cellpadding? = string
cellspacing? = string
color? = string
colspan? = integer
fixedsize? = false|true
gradientangle? = double
height? = string
href? = string
id? = string
port? = string
rowspan? = string
sides? = string
style? = string
target? = string
title? = string
tooltip? = string
width? = string>
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:td>
font
<font xmlns="http://www.w3.org/1999/xhtml"
color? = string
face? = string
point-size? = string>
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:font>
br
<br xmlns="http://www.w3.org/1999/xhtml"
balign? = center|left|right />
img
<img xmlns="http://www.w3.org/1999/xhtml"
scale? = string
src = string />
i
<i xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:i>
b
<b xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:b>
o
<o xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:o>
sub
<sub xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:sub>
sup
<sup xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:sup>
s
<s xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:s>
hr
<hr xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:hr>
vr
<vr xmlns="http://www.w3.org/1999/xhtml">
(string |
h:font |
h:br |
h:img |
h:i |
h:b |
h:o |
h:sub |
h:sup |
h:s |
h:hr |
h:vr)*
</h:vr>