Chapter 3. Running steps in parallel
If your processor has multiple cores (or if you have multiple processors), you may be able to run pipelines with multiple threads. This allows some steps to run in parallel. The version command will tell you how many threads are available.
Even if multiple threads are available, XML Calabash will only use a single thread unless you enable more in the configuration file. You must explicitly enable support for multiple threads because writing multithreaded pipelines demands extra care.
Consider Example 3.1, “A dubious pipeline”. In a single threaded environment, that pipeline might run correctly. It would depend entirely on the order in which the processor choose to run the steps.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| version="3.0">
|
| <p:output port="result"/>
5 |
| <p:store name="store" href="/tmp/file.txt">
| <p:with-input>
| <p:inline content-type="text/plain">One Two Three</p:inline>
| </p:with-input>
10 | </p:store>
|
| <p:sink/>
|
| <p:load name="load" href="/tmp/file.txt"/>
15 |
| <p:text-count/>
|
|</p:declare-step>
In a multithreaded environment, it’s unlikely, but unfortunately not impossible, that the pipeline would be successful. The easiest way to see why is by looking at the pipeline graph:
There you can see that there are two independent sequences of steps. Steps in the first sequence are labeled “①”, steps in the second are labeled “②”. (There’s actually a third sequence; it contains the sink step represented as a • in the graph.)
These are “thread groups”. Static analysis of the pipeline divides all of the children of each compound step into thread groups. A thread group consists of a linear sequence of connected steps. They’re connected, so they have to run sequentially, so there’s no benefit in putting them into different threads.
At runtime, the processor considers the number of threads actually available and constructs one group for each available thread. (This may result in combining several thread groups if there are fewer available threads than statically determined thread groups.)
The processor makes no effort to determine an optimal arrangement if it has to combine several thread groups. In principle, you’d want to have long running steps distributed into different threads and short running steps combined together. The processor only assures that deadlock won’t occur across different threads.
Then a thread is started for each runtime thread group and they run (to the largest extent possible) in parallel. In a complicated pipeline, there may be dependencies that cross thread boundaries, in this case the threads will pause where necessary until the dependencies are satisfied.
The order in which steps in different thread groups are executed is indeterminate. In Example 3.1, “A dubious pipeline”, it’s likely that the inline and load steps will execute “at the same time” and the pipeline will fail (because the store step hasn’t created the file yet) or, if the file exists, that it will produce the wrong result (because the “old version” will be loaded before the “new version” is written). But it isn’t impossible, depending on how much load the processor is under and a wide variety of unknowable circumstances, that the inline and store steps could run before the load step.
The problem here is the invisible dependency between
the p:document
instruction and the “preceding” p:load
step. It’s invisible because it isn’t visible in the connections between the steps.
There are a number of ways to fix this pipeline, but for the purposes of this chapter, let’s fix it by adding a “depends” link between the load step and the store step. The resulting pipeline will run correctly, as we can see from its graph:
Even though the store and load steps are in separate thread groups, the dependency link assures that the store step will always run first. It’s also reduced almost all parallism in the graph, making threading largely inconsequential.
The pipeline in Example 3.2, “A highly parallel pipeline” shows a pipeline that benefits much more substantially from multithreading.
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
| version="3.0">
|
| <p:output port="result"/>
5 |
| <p:identity name="source-A">
| <p:with-input><A/></p:with-input>
| </p:identity>
|
10 | <p:sleep name="processed-A" duration="PT2S"/>
|
| <p:identity name="source-B">
| <p:with-input><B/></p:with-input>
| </p:identity>
15 |
| <p:sleep name="processed-B" duration="PT3S"/>
|
| <p:identity name="source-C">
| <p:with-input><C/></p:with-input>
20 | </p:identity>
|
| <p:sleep name="processed-C" duration="PT4S"/>
|
| <p:wrap-sequence wrapper="results">
25 | <p:with-input pipe="@processed-A @processed-B @processed-C"/>
| </p:wrap-sequence>
|
|</p:declare-step>
The graph shows that the three processing steps (real work being simulated here
with p:sleep
) can be run in parallel.
Run with a single thread, this pipeline will take at least nine seconds to execute. With three threads, it is likely to finish in just over four seconds.
If there is any opportunity for parallelism in a pipeline, using multiple threads will introduce unpredictable behavior into it. This can cause pipelines that have invisible dependencies to fail or produce incorrect results. It can also make pipelines run substantially faster.
Is threading worth it? It depends.