cx:selenium

Name

cx:selenium — Drive a web browser with Selenium.

Synopsis

The cx:selenium step uses Selenium to automate a web browser. The step can drive a web browser and extract all or part of rendered pages.

Input port	Primary	Sequence	Content types
source	✔		text xml

Output port	Primary	Sequence	Content types
result	✔	✔

Option name	Type	Default value
arguments	xs:string*	()
browser	xs:string?	()
capabilities	map(xs:QName, item())?	()

This is an extension step; to use it, your pipeline must include its declaration. For example, by including the extension library with an import at the top of your pipeline:

<p:import href="https://xmlcalabash.com/ext/library/selenium.xpl"/>

Declaration

1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
  |   <p:input port="source" content-types="text xml"/>
  |   <p:output port="result" sequence="true"/>
  |   <p:option name="browser" as="xs:string?"/>
5 |   <p:option name="capabilities" as="map(xs:QName, item())?"/>
  |   <p:option name="arguments" as="xs:string*"/>
  |</p:declare-step>

Errors

Code	Description
`cxerr:XC0023`	It is a dynamic error (`cxerr:XC0023`) if the page URI does not match at least one of the whitelist expressions.

Description

Selenium automates browsers. Steps like p:http-request can interact with the web, making selected, individual requests. The cx:selenium step fires up an actual web browser and interacts with it. In practice, what this means is that JavaScript is executed and the result is available to the step.

Selenium is widely used for testing web applications. There are lots of programming APIs that can drive it. What the cx:selenium step does is expose that functionality through a small scripting language.

The goal here is to make a language that’s easy to use for common sorts of tasks, not one that can do everything that Selenium can do. It’s also been invented in a somewhat ad hoc manner by someone with relatively little Selenium programming experience. Suggestions for improvements are welcome.

Whitelisting

The cx:selenium step is running an actual web browser. In principle, if you can do something with a web browser, you can do it with this step: login to your bank, order pizza, etc. Care is advised.

It is possible to whitelist the URIs that cx:selenium will load. Add a selenium element to your configuration:

  |<x:selenium xmlns:x="https://xmlcalabash.com/ext/ns/selenium"
  |            whitelist="http://localhost.*
  |                       https://testdata.xmlcalabash.com/.*"/>

The whitelist attribute is a space-separated list of regular expressions. If the page URI matches one of those regular expressions, the step will run. It is a dynamic error (cxerr:XC0023) if the page URI does not match at least one of the whitelist expressions.

The scripting language

A cx:selenium script begins with a version declaration, identifies the page to open in the browser, and has one or more statements.

Figure 1. Overall structure of a cx:selenium script

The scripting language is described in part with “railroad diagrams”. They indicate how a script is constructed from various constructs. In the diagrams, an oval containing bold text represents something you literally type. Words in rectangles are references to other parts of the grammar and what’s expected there is some example of that construct. Generally speaking, whitespace is expected between the ovals and boxes, except that whitespace around punctuation is often optional.

The summary in Figure 1, “Overall structure of a cx:selenium script” indicates that a script begins with the literal text “script version 0.3” followed by (optional) whitespace, the literal text “.”, whitespace, the literal text “page”, whitespace, any string, the literal text “.”, and one or more statements. For example:

script version 0.3. page "http://example.com/" .
output "Hello, world." .

The following versions are supported:

0.2: The first published version.
0.3: Introduces the “until different” clause on find and fixes an error in the grammar of the send statement.

All version 0.2 grammars are valid in 0.3.

Statement

There are four blocks and about 20 different kinds of statements.

Figure 2. A statement

A “simple” statement stands alone. A subset of the simple statements, the “compound” statements, can be joined together and performed at once. (This is an analog for the Selenium concept of building a sequence of actions and then performing them.)

Figure 3. A simple statement

Figure 4. A compound statement

Figure 5. A perform statement

Blocks

There are four kinds of blocks: three conditionals (if, while, until) and subroutines.

Conditional blocks

Figure 6. An if block

The statements in an “if” block are evaluated if (and only if) the effective boolean value of the test expression is true. An expression is a quoted string containing an XPath expression.

Figure 7. A while block

The statements in a “while” block are evaluated repeatedly as long as the effective boolean value of the test expression is true. If the test expression is initially false, the statements in the block are not executed at all.

Figure 8. An until block

The statements in an “until” block are evaluated repeatedly as long as the effective boolean value of the test expression is true. The statements are always evaluated at least once, the expression is tested at the end of each loop.

Subroutines

Figure 9. An subroutine

Subroutines are a way to group statements that you can evaluate with the call statement. Subroutines are collected before script evaluation begins, so they can appear anywhere a statement can occur, even if that’s after call statements that refers to them. All subroutine names must be unique.

The name and the first statement must be separated by at least one newline.

find statement

A find statement locates an element on the page and stores its (HTML) content in a variable. With the all keyword, it finds all of the elements that match the locator. If wait is added, the processor will wait as long as the specified duration for the locator to find at least one match. A pause specifies the duration to wait between each attempt; the default is 0.25s.

If an “until different” clause is specified, the find statement will loop until the value returned differs from the value in $originalvar. If either variable is a list (a “find all”), they are different only if no value in the new result appears in $originalvar.

In Selenium, it’s an error if the locator doesn’t match anything. In the cx:selenium step, it’s not an error, the variable will simply hold the empty sequence. If, however, a further attempt is made to perform a Selenium action with the variable (click on it or send text to it, for example), an error will occur. You can avoid this by first testing if the variable is empty.

Figure 10. The find statement

The token that follows “by” identifies the kind of match to be performed and consequently the form that the following string must have:

Token	Find by …	Example string
`name`	Name attribute	button-name
`selector`	CSS selector	.someClass
`id`	ID	someId
`link-text`	Exact text of a link	click here
`partial-link-text`	Partial text of a link	click
`tag`	Element name	form
`class`	Class name	someClass
`xpath`	XPath (1.0) expression	/html/body/h1[2]

There is one, global scope for variable names and they are mutable. Whether they are set with find or set, whether they are set in the main body of the script or in a subroutine, they always have the last value set.

set statement

A set statement sets a variable to some value. This can be some property of the window or page, a cookie, a string, the result of evaluating an XPath expression, or to the property of some element on the page.

Figure 11. The set statement

Where property is a synonym for name. The token that follows “to” identifies the kind of query to be performed.

window

The size or location of the browser window.

set $width to window width.

page

The URL or title of the page.

set $title to window title.

cookie

The value of the cookie named. If the cookie name doesn’t conform to the constraints of a name, put the name in a quoted string.

set $login to cookie login-id.

string

The string provided.

set $hello to string "Hello, world.".

xpath

The result of evaluating the XPath expression. Unlike the XPath expression in a find statement, which is evaluated by Selenium and must be an XPath 1.0 expression, this expression is evaluated by the step and is an XPth 3.0 expression.

set $narrow to xpath "$width lt 600".

element

Some property of the element in $varname. For example, if you used a find statement to locate an input element on the page ($input), you could use a set statement to obtain its value:

set $value to element $input value .

This differs from find element … which returns the actual element.

send statement

The send statement sends text to the input on the page identified by $varname. Strings cannot contain newlines, so if you want to send a longer fragment, delimit it with “¶”, “⁋”, “§”, or formfeed characters.

Figure 12. The send statement

Version 0.3 fixed a bug where the $varname was only option in the simple “string” case.

click statement

The click statement simulates clicking on the element identified by $varname.

Figure 13. The click statement

wait statement

There are two forms of the wait statement, “wait until ready” and “wait for a duration” in the find statement.

The “wait until ready” statement waits until the page is ready. That is, it waits until the page indicates that document.readyState is “complete”.

Figure 14. The wait until ready statement

The “wait for a duration” statement waits for a specified duration.

Figure 15. The wait statement

pause statement

The pause statement waits for a specified duration.

Figure 16. The pause statement

message statement

The message statement computes the value of the expression and sends it to the message handler at the “info” level.

Figure 17. The message statement

output statement

The output statement sends output from the step. The element on the page identified by $varname, arbitrary text, or the result of evaluating an expression can be sent to the result port.

Figure 18. The output statement

Each output statement creates a new document on the result port.

window statement

The window statement updates aspects of the browser window.

Figure 19. The window statement

cookie statement

The cookie statement sets a cookie. If the name of the cookie satisfies the constraints of a name, then you can just use the name. For arbitrary names, use a string.

scroll statement

The scroll statement attempts to scroll the browser window. This statement seems to be somewhat inconsistently implemented by browsers. Firefox, for example, won’t scroll to an element not already visible in the viewport.

To support scrolling arbitrarily, the cx:selenium step implements “scroll to $varname” by evaluating the JavaScript expression varname.scrollIntoView(true).

Figure 21. The scroll statement

move statement

The move command moves to the element identified by $varname.

Figure 22. The move statement

release statement

The release statement releases the mouse after a “click and hold” statement.

Figure 23. The move statement

drag statement

The drag statement drags one element to another.

Figure 24. The drag statement

navigate statement

The navigate statement changes the page in the browser.

Figure 25. The navigate statement

refresh statement

The refresh statement refreshes the page.

Figure 26. The refresh statement

reset statement

The reset statement resets Selenium.

Figure 27. The reset statement

close statement

The close statement closes the browser.

Figure 28. The close statement

This ends the script.

key statement

The key statement presses or releases a key.

Figure 29. The key statement

Where a keyname is one of the names in Figure 30, “The key names” and a char is any string containing a single character.

Figure 30. The key names

call statement

The call statement calls a defined subroutine.

Figure 31. The call statement

Names

A name is a letter or an underscore followed by a letters, numbers, and variety of punctuation characters.

Figure 32. Names

Figure 33. Name start characters

Where “UnicodeL” is any Unicode character in the “L” category (letters).

Figure 34. Name following characters

Where “UnicodeNd” is any Unicode character in the “Nd” category (decimal numbers) and “UnicodeNd” is any Unicode character in the “Mn” category (nonspacing marks).

Variable names

Like XPath, variable names begin with a $.

Figure 35. Variable names

Strings

Strings begin and end with quote delimiters and must not break across lines.

Figure 36. Strings

Durations

A duration is a number of milliseconds or an xs:dayTimeDuration.

Figure 37. Durations

Integers and numbers

Positive or negative integers or decimal numbers.

Figure 38. Integers

Note that negative integers are forbidden in some contexts (for example, window sizes).

Figure 39. Numbers

There are no use cases for negative decimal numbers, so signs are not allowed.

Example

The following pipeline uses the cx:selenium step to load the “cities” example page. This page displays a table cities in the United Kingdom with the country they’re in and their latitude and longitude. A “more” button loads more cities.

The Selenium script clicks the “more” button until the city of Appleton is in the table, then it returns the latitude and longitude in two text documents.

 1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
   |                xmlns:cx="http://xmlcalabash.com/ns/extensions"
   |                name="main" version="3.0">
   |  <p:import href="https://xmlcalabash.com/ext/library/selenium.xpl"/>
 5 | 
   |  <p:output port="result" serialization="map{'method':text}" sequence="true"/>
   | 
   |  <cx:selenium xmlns:h="http://www.w3.org/1999/xhtml">
   |    <p:with-option name="arguments" select="('--headless')"/>
10 |    <p:with-input>
   |      <p:inline content-type="text/plain">
   |script version 0.3 .
   |page "https://testdata.xmlcalabash.com/cities/" .
   | 
15 |# Wait until the table has been populated
   |until "not(empty($row))" do
   |  find $row by selector = "table tbody tr" .
   |  pause PT0.25S .
   |done
20 | 
   |# Search for Appleton, hit more until we find it
   |find $city by xpath = "//td[. = 'Appleton']".
   |while "empty($city)" do
   |  call clickNext .
25 |  find $city by xpath = "//td[. = 'Appleton']".
   |done
   | 
   |find $row by xpath "//tr[td[. = 'Appleton']]" .
   | 
30 |output xpath "normalize-space($row/*:td[3])" to result .
   |output xpath "normalize-space($row/h:td[4])" to result .
   | 
   |close .
   | 
35 |subroutine clickNext
   |  find $button by selector = "button" .
   |  scroll to $button .
   |  click $button .
   |  pause PT0.25S .
40 |end  
   |      </p:inline>
   |    </p:with-input>
   |  </cx:selenium>
   |</p:declare-step>

It’s not written in an especially efficient way. It’s written to demonstrate a variety of statements and features.

Dependencies

This step is included in the XML Calabash application. If you are getting XML Calabash from Maven, you will also need to include these additional dependencies:

org.nineml:coffeefilter:3.2.9
org.nineml:coffeegrinder:3.2.9
org.seleniumhq.selenium:selenium-java:4.28.1

Prev	Up	Next
cx:railroad	Home	cx:trang