Name
cx:fileset — Approximates the Ant notion of a fileset.
Synopsis
This step selects files using a vocabulary based on the FileSet vocabulary of Ant.
Input port | Primary | Sequence | Content types | Default binding |
---|---|---|---|---|
source | ✔ | ✔ | xml | p:empty |
Output port | Primary | Sequence | Content types | Default binding |
---|---|---|---|---|
result | ✔ | ✔ | xml |
Option name | Type | Default value | Required |
---|---|---|---|
path | xs:string | ✔ | |
case-sensitive | xs:boolean | true() | |
default-excludes | xs:boolean | true() | |
detailed | xs:boolean | false() | |
error-on-missing-dir | xs:boolean | true() | |
excludes | xs:string? | () | |
follow-symlinks | xs:boolean | true() | |
includes | xs:string? | () |
<p:import href="https://xmlcalabash.com/ext/library/fileset.xpl"/>
Declaration
1 |<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
| <p:input port="source" content-types="xml" sequence="true">
| <p:empty/>
| </p:input>
5 | <p:output port="result" content-types="xml" sequence="true"/>
| <p:option name="path" as="xs:string" required="true"/>
| <p:option name="default-excludes" as="xs:boolean" select="true()"/>
| <p:option name="case-sensitive" as="xs:boolean" select="true()"/>
| <p:option name="error-on-missing-dir" as="xs:boolean" select="true()"/>
10 | <p:option name="follow-symlinks" as="xs:boolean" select="true()"/>
| <p:option name="includes" as="xs:string?"/>
| <p:option name="excludes" as="xs:string?"/>
| <p:option name="detailed" as="xs:boolean" select="false()"/>
|</p:declare-step>
Errors
Code | Description |
---|---|
cxerr:XC0043 | It is a dynamic error (cxerr:XC0043 ) if the source
document is not a fileset document. |
cxerr:XC0044 | It is a dynamic error (cxerr:XC0044 ) if they are
specified in both places. |
cxerr:XC0051 | It is a dynamic error (cxerr:XC0051 ) if the path
provided is not a file: URI. |
cxerr:XC0052 | It is a dynamic error (cxerr:XC0052 ) if the path
provided does not exist and error-on-missing-dir is true. |
Description
This step is based on the Ant
notion of FileSets.
Conceptually, all of the files below some starting point (the
path
) are filtered through a set of inclusions and exclusions. Any files
that satisfy the filters are further subjected to a set of selectors (which may use mappers)
to determine if they are in the final result or not.
The cx:fileset
step differs from Ant in some ways:
The files selected by
cx:fileset
are normalized withurify()
. They exclusively use “/” as the path separator.The step is only concerned with selecting existing files.
The “type” selector is not provided.
Attributes related to filtering directories are not provided.
It doesn’t support anything like Ant FilterSets which change the contents of files.
The names of the elements and attributes have been changed to make them more consistent with XProc (“compound-name” instead of “compoundname”).
The “scriptselector” is not supported.
You can’t provide a custom filter or mapper class, the step isn’t extensible in that way.
Ant has a number of features for defining filters, patterns, and selectors and then reusing them by reference. XProc has lots of features for constructing XML documents, so those mechanisms are not supported.
In order to avoid having all of Ant become a dependency, this is a reimplementation of the functionality. The Ant documentation isn’t especially precise. It’s possible that there are unintentional semantic differences beyond the differences outlined here. (Please report them.)
This step differs substantially from he p:directory-list
step in that the
include and exclude filters are “globs” not regular expressions!
If the source port is not empty, it must contain a
fileset document.
It is a dynamic error (cxerr:XC0043
) if the source
document is not a fileset
document.
Where regular expressions are used, they use the regular expression syntax of the underlying platform. This will change in the future and the XPath regular expression syntax will be used instead. For most simple cases, there’s no difference.
Step options
The cx:fileset
step has several options.
path
The path. The value provided will be normalized with the
urify()
function. The resulting URI must be a “file” URI. It is a dynamic error (cxerr:XC0051
) if the path provided is not a file: URI.case-sensitive
Several selectors support case-sensitive or case-insensitive comparisons. This option provides the default value for those selectors.
default-excludes
If this option is true, the default exclusions are automatically used.
detailed
If this option is true, detailed information will be provided about each file. See
p:directory-list
for more details.error-on-missing-dir
It is a dynamic error (
cxerr:XC0052
) if the path provided does not exist and error-on-missing-dir is true.excludes
A space or comma separated list of globs. Any file that matches will be excluded.
follow-symlinks
Several selectors have an option to follow symbolic links. This option provides the default value for those selectors.
includes
A space or comma separated list of globs. Any file that matches will be included, unless it also matches an exclusion. If no inclusions are provided, all of the descendants are included by default.
The fileset document
The fileset document contains zero or more include
or exclude
elements and zero or more selectors. Some selectors may contain mappers.
A selector tests each file that is filtered through the includes and excludes. If the file “passes” the selection test, it remains included. If it “fails”, it is removed.
Selectors that compare the file in question against other files on the filesystem may use mappers to transform the filename. For example:
1 |<cx:fileset path="/path/to/input" includes="*.svg">
| <p:with-input>
| <fileset>
| <present target="/path/to/output">
5 | <glob-mapper from="*.svg" to="*.png"/>
| </present>
| </fileset>
| </p:with-input>
|</cx:fileset>
This fileset
begins with all of the SVG
files under /path/to/input
. For each one, the
present
selector tests, does this file exist at an
equivalent location under the /path/to/output
directory?
The glob-mapper
changes the
.svg
extension to .png
.
Suppose /path/to/input
contains a.svg
,
b.svg
, and c.svg
and
/path/to/output
contains a.svg
,
b.png
, and d.png
.
The only file returned will be b.svg
. Neither
a.svg
nor c.svg
will be returned because
there’s no equivalent PNG file in the output directory. And d.svg
won’t be returned because there’s no such file in the input directory.
If more than one selector is provided, that is the same as a single and
selector containing all of them.
The fileset element
The fileset
element contains the configuration of the step.
<fs:fileset | |
default-excludes? = boolean | Enable default exclusions |
case-sensitive? = boolean | Default case sensitivity |
error-on-missing-dir? = boolean | Raise an error if the path doesn't exist |
follow-symlinks? = boolean | Default value for following symlinks |
includes? = string | One or more glob patterns to include |
excludes? = string | One or more glob patterns to exclude |
> | |
(include | exclude | (contains | date | depend | depth | different | filename | present | contains-regexp | size | readable | writable | executable | symlink | owned-by | posix-group | posix-permissions | content-type))* | |
</fs:fileset> |
The options for default exclusions, case sensitivity, whether it’s an error
if the path is missing, and the default for following symbolic links must be specified
either on the step or on the fileset
element.
It is a dynamic error (cxerr:XC0044
) if they are
specified in both places. If inclusions or exclusions are specified in both
places, both sets of patterns (and the patterns from any nested include
or exclude
elements) are used.
The include element
The include
element identifies a (single) glob pattern to include.
If the if
attribute is false or the
unless
attribute is true, the element is ignored.
<fs:include | |
name = string | A single glob pattern |
if? = string | Use this inclusion if this is true |
unless? = string | Use this inclusion unless this is true |
/> |
The exclude element
The exclude
element identifies a (single) glob pattern to exclude.
If the if
attribute is false or the
unless
attribute is true, the element is ignored.
<fs:exclude | |
name = string | A single glob pattern |
if? = string | Use this exclusion if this is true |
unless? = string | Use this exclusion unless this is true |
/> |
The contains element
The contains
element selects a file if it contains the specified text.
<fs:contains | |
text = string | The text that must appear in the document |
case-sensitive? = boolean | Should the search be case sensitive? |
ignore-whitespace? = boolean | Ignore whitespace? |
encoding? = string | The encoding to use when reading the document |
/> |
If ignore-whitespace
is true, all white space is
stripped from the search text and the file before making the comparision.
The content-type element
The content-type
element selects a file if it has one of the specified content types.
<fs:content-type | |
content-types = string | The list of content types (as per p:input) |
/> |
This selector is not present in Ant.
The date element
The date
element selects a file if it’s last modified time matches
the constraints specified.
<fs:date | |
date-time = dateTime | The target date-time |
when = before|after|equal | The relationship to test |
granularity? = integer | Granularity of comparison |
/> |
If when
is “before”, the last modified time must be before
the specified date-time
. If it’s “after”, it must be after. If
it’s “equal” (the default), it must be equal.
For the purposes of comparision, the last modified time is equal to the specified date
time if it’s within granularity
milliseconds of the specified
date-time
. On Windows, the default granularity is 2 seconds, on
other systems, it’s 0.
The depend element
The depend
element selects a file if it exists under the
target-dir
and if the target file is newer.
<fs:depend | |
target-dir? = anyURI | The target directory |
granularity? = integer | Granularity of comparison |
> | |
(identity-mapper | flatten-mapper | merge-mapper | glob-mapper | regexp-mapper | package-mapper | unpackage-mapper | composite-mapper | chained-mapper | first-match-mapper | cut-dirs-mapper)? | |
</fs:depend> |
For the purposes of comparision, two files are considered to have the same
last modified time if they are within granularity
milliseconds of each other. On Windows, the default granularity is 2 seconds, on
other systems, it’s 0.
If no mapper is specified, the identity-mapper
is used.
The depth element
The depth
element selects a file if it is at least
min
directory levels and at most
max
directory levels from the root.
<fs:depth | |
min? = integer | The minimum depth |
max? = integer | The maximum depth |
/> |
If min
is unspecified, it defaults to 0.
If max
is unspecified, it defaults to ∞.
The different element
The different
element selects a file if it exists under the
target-dir
and is different.
<fs:different | |
target-dir? = anyURI | The target directory |
ignore-file-times? = boolean | Ignore last modified time? |
ignore-contents? = boolean | Ignore the file contents? |
granularity? = integer | Granularity of comparison |
> | |
(identity-mapper | flatten-mapper | merge-mapper | glob-mapper | regexp-mapper | package-mapper | unpackage-mapper | composite-mapper | chained-mapper | first-match-mapper | cut-dirs-mapper) | |
</fs:different> |
The default value for ignore-file-times
is true;
the default value for ignore-contents
is false.
For the purposes of comparision, two files are considered to have the same
last modified time if they are within granularity
milliseconds of each other. On Windows, the default granularity is 2 seconds, on
other systems, it’s 0.
If no mapper is specified, the identity-mapper
is used.
The filename element
The filename
element matches a file if it matches the glob or
regular expression provided.
<fs:filename | |
name? = string | A single glob pattern |
regex? = string | A regular expression |
case-sensitive? = boolean | Should the comparison be case sensitive? |
negate? = boolean | Reverse the effect of the selection |
/> |
Exactly one of name
or regex
must be provided.
This element is like the include
element,
but it can be combined with other selectors. If negate
is true, this element is like the exclude
element.
The present element
The present
element selects a file by comparing it
against an equivalent file under the target-dir
.
<fs:present | |
target-dir = anyURI | The target directory |
present? = srconly|both | Only in the source, or in both? |
> | |
(identity-mapper | flatten-mapper | merge-mapper | glob-mapper | regexp-mapper | package-mapper | unpackage-mapper | composite-mapper | chained-mapper | first-match-mapper | cut-dirs-mapper) | |
</fs:present> |
If present
is “srconly”, the file is selected if
it only exists in the source (if it is not under the target-dir
).
If present
is “both”, the file is selected if
it exists in both places. (A “target only” value is incoherent because the comparison
is always against a file that does exist.)
If no mapper is specified, the identity-mapper
is used.
The contains-regexp element
The contains-regexp
element selects a file if it contains the specified
regular expression.
<fs:contains-regexp | |
expression = string | The expression to match |
case-sensitive? = boolean | Should the search be case sensitive? |
encoding? = string | The encoding to use when reading the document |
multi-line? = boolean | Use multi-line searches |
single-line? = boolean | Use single-line searches |
/> |
If multi-line
is true, the match may extend across line breaks.
If single-line
is true, “.” may match newlines. (This is
the “dotall” flag in Java regular expressions.
The size element
The size
element selects a file based on its size.
<fs:size | |
value = integer | The value |
units? = da|h|k|M|G|T|P|Ki|Mi|Gi|Ti|Pi | The units |
when? = less|more|equal | The relationship to test |
/> |
If no units are specified, the value
is an exact
number of bytes. If units
are specified, they have the following
effect: da
multiplies the value by 10, h
multiplies
the value by 100,
k
, 1,000,
M
, 1,000,000,
G
, 109
T
, 1012 and
P
, 1015.
The remaining units multiply by powers of two:
Ki
, 1024,
Mi
, 10242,
Gi
, 10243,
Ti
, 10244, and
Pi
, 10245.
The when
attribute determines how the actual file size
must compare to the specified value.
The readable element
The readable
element selects a file if it is readable by the user.
<fs:readable/> |
The writable element
The readable
element selects a file if it is writable by the user.
<fs:writable/> |
The executable element
The executable
element selects a file if it is executable by the user.
<fs:executable/> |
The symlink element
The symlink
element selects a file if it is a symbolic link.
<fs:symlink/> |
Symbolic links are platform specific. This selector returns true if the underlying JVM considers a file on the particular filesystem in use a “symbolic link”.
The owned-by element
The owned-by
element selects files that are owned by
owner
.
<fs:owned-by | |
owner = string | The owner name |
follow-symlinks? = boolean | Follow symlinks? |
/> |
If the file being tested is a symbolic link and follow-symlinks
is true, the ownership of the file
linked to is tested. Otherwise the owner of the link is tested.
The posix-group element
The posix-group
element selects files that are members
of the specified group
.
<fs:posix-group | |
group = string | The group name |
follow-symlinks? = boolean | Follow symlinks? |
/> |
If the file being tested is a symbolic link and follow-symlinks
is true, the group of the file
linked to is tested. Otherwise the group of the link is tested.
This selector requires a POSIX compatible filesystem. It will always return false in other cases, for example, on Windows.
The posix-permissions element
The posix-permissions
element selects files that have
specific permissions.
<fs:posix-permissions | |
permissions = string | The permissions |
follow-symlinks? = boolean | Follow symlinks? |
/> |
Permissions can be expressed as an octal number (“755”) or using the r/w/- notation (“rwxr-xr-x”). In either case, the match is exact. The selector doesn’t support any kind of wildcard matching.
If the file being tested is a symbolic link and follow-symlinks
is true, the permissions of the file
linked to are tested. Otherwise the permissions of the link are tested.
This selector requires a POSIX compatible filesystem. It will always return false in other cases, for example, on Windows.
The chained-mapper element
The chained-mapper
returns the result of applying each mapper in turn.
The initial file is the input to the first mapper, the output from the first mapper
is the input to the second mapper, and so on.
<fs:chained-mapper> | |
(identity-mapper | flatten-mapper | merge-mapper | glob-mapper | regexp-mapper | package-mapper | unpackage-mapper | composite-mapper | chained-mapper | first-match-mapper | cut-dirs-mapper)+ | |
</fs:chained-mapper> |
The composite-mapper element
The composite-mapper
returns the result of applying the (same) initial file
to each mapper. This is the union of all the mapper outputs.
<fs:composite-mapper> | |
(identity-mapper | flatten-mapper | merge-mapper | glob-mapper | regexp-mapper | package-mapper | unpackage-mapper | composite-mapper | chained-mapper | first-match-mapper | cut-dirs-mapper)+ | |
</fs:composite-mapper> |
The cut-dirs-mapper element
The cut-dirs-mapper
removes dirs
path
segments from the front of the file. If the file has fewer path segments, nothing
is returned.
<fs:cut-dirs-mapper | |
dirs = integer | The number of directories to remove |
/> |
The first-match-mapper element
The first-match-mapper
applies the original file to each mapper
in turn, returning the results of the first mapper that succeeds.
<fs:first-match-mapper> | |
(identity-mapper | flatten-mapper | merge-mapper | glob-mapper | regexp-mapper | package-mapper | unpackage-mapper | composite-mapper | chained-mapper | first-match-mapper | cut-dirs-mapper)+ | |
</fs:first-match-mapper> |
The flatten-mapper element
The flatten-mapper
returns the filename of the original file with
all leading path segments removed.
<fs:flatten-mapper/> |
The glob-mapper element
The glob-mapper
matches the original file against the
from
glob. If it doesn’t match, nothing is returned.
If it does match, the to
glob is used to change the
file.
<fs:glob-mapper | |
from = string | The match glob |
to = string | The target glob |
case-sensitive? = boolean | Default case sensitivity |
/> |
The from
and to
globs much
each contain exactly one *
. The text matched by the *
in the from
glob is used to replace the *
in
the to
glob.
The identity-mapper element
The identity-mapper
returns the original file unchanged.
<fs:identity-mapper/> |
The merge-mapper element
The merge-mapper
returns the to
value irrespective of the original file.
<fs:merge-mapper | |
to = string | The target value |
/> |
The package-mapper element
The package-mapper
applies the same processing as the
regexp-matcher
, then replaces all of the “/” characters with “.”.
<fs:package-mapper | |
from = string | The match glob |
to = string | The target glob |
/> |
This turns, for example, org/example/package/Class.java
into org.example.package.Class.html
assuming the
from
value is “*.java” and the
two
value is “*.html”.
The unpackage-mapper element
The unpackage-mapper
applies the same processing as the
regexp-matcher
, then replaces all but the last “.” characters with “/”.
<fs:unpackage-mapper | |
from = string | The match glob |
to = string | The target glob |
/> |
This is the reverse of the package-mapper
.
The regexp-mapper element
The regexp-mapper
matches the original file against the
from
regular expression. If it doesn’t match, nothing is returned.
If it does match, the to
expression is used to change the
file.
<fs:regexp-mapper | |
to = string | The match regular expression |
from = string | The replacement expression |
case-sensitive? = boolean | Default case sensitivity |
/> |
Each occurrence of “\0” to “\9” is replaced with the corresponding match
group from the from
expression (where “\0” is the whole
string, “\1” is the first match group, etc.
Default exclusions
The default exclusions are:
Any file or directory named
.DSStore
,
.bzr
,
.bzrignore
,
.cvsignore
,
.git
,
.gitattributes
,
.gitignore
,
.gitmodules
,
.hg
,
.hgignore
,
.hgsub
,
.hgsubstate
,
.hgtags
,
.svn
,
CVS
,
SCCS
,
vssver.scc
. If a directory name is excluded, so are
all of its descendants.
Any file with a name that ends with ~
.
Any file with a name that starts with .#
or ._
.
Any file with a name that begins and ends with #
or
begins and ends with %
.
Globs
A glob is a file or path that may contain the wildcards “*” or “**/”.
The “*” wildcard matches any number of characters except “/”.
The “**/” wildcard matches any number of path segments (including none).
If a glob begins with “/”, it is anchored at the root. (Otherwise, it is logically preceded by “**/”.)