Name
cx:pdf-extract — Extract pages from a PDF.
Synopsis
This step extracts pages from a PDF.
| Input port | Primary | Sequence | Content types |
|---|---|---|---|
| source | ✔ | application/pdf |
| Output port | Primary | Sequence | Content types |
|---|---|---|---|
| result | ✔ | application/pdf |
| Option name | Type | Values | Default value |
|---|---|---|---|
| compression | xs:string | ('none', 'default') | 'default' |
| pages | xs:string | () | |
| password | xs:string? | () |
<p:import href="https://xmlcalabash.com/ext/library/pdf-steps.xpl"/>Declaration
1 |<p:declare-step xmlns:cx="http://xmlcalabash.com/ns/extensions"
| xmlns:p="http://www.w3.org/ns/xproc"
| type="cx:pdf-extract">
| <p:input port="source" content-types="application/pdf"/>
5 | <p:output port="result" content-types="application/pdf"/>
| <p:option name="password" as="xs:string?"/>
| <p:option name="pages" as="xs:string"/>
| <p:option name="compression" values="('none', 'default')" select="'default'"/>
|</p:declare-step>Description
This step extracts pages from a PDF. It can also duplicate and re-order pages.
The pages option specifies the pages to extract (to
be included) in the result PDF. Individual page numbers, and page ranges, may be
specified. For example, “1,3,10-12” would create a new PDF
containing pages 1, 3, 10, 11, and 12 from the original PDF. Pages
can be reordered; “2,1” would create a PDF containing
page 2 of the original PDF followed by page 1 (and no other pages). Pages
can also be duplicated; “1,1” would create a PDF containing
two pages, both copies of page 1 from the original PDF.
If a sequential subset of pages is selected, the resulting PDF is created by simply deleting the other pages. This always results in a smaller PDF file. Reordering or duplicating pages requires making copies of each page. This can result in a larger PDF result, even though it’s a subset of the pages, because global resources in the original PDF (images, fonts, etc.) may be copied into each individual page copy.
In principle, it’s possible to optimize the resulting PDF. In practice, it’s not clear how to do it reliably.
Document properties
No document properties are preserved.
Additional examples
The XML Calabash test suite contains examples of the cx:pdf-extract step.