cx:pdf-extract

Name

cx:pdf-extract — Extract pages from a PDF.

Synopsis

This step extracts pages from a PDF.

Input port	Primary	Sequence	Content types
source	✔		application/pdf

Output port	Primary	Sequence	Content types
result	✔		application/pdf

Option name	Type	Values	Default value
compression	xs:string	('none', 'default')	'default'
pages	xs:string		()
password	xs:string?		()

This is an extension step; to use it, your pipeline must include its declaration. For example, by including the extension library with an import at the top of your pipeline:

<p:import href="https://xmlcalabash.com/ext/library/pdf-steps.xpl"/>

Declaration

1 |<p:declare-step xmlns:cx="http://xmlcalabash.com/ns/extensions"
  |                xmlns:p="http://www.w3.org/ns/xproc"
  |                type="cx:pdf-extract">
  |   <p:input port="source" content-types="application/pdf"/>
5 |   <p:output port="result" content-types="application/pdf"/>
  |   <p:option name="password" as="xs:string?"/>
  |   <p:option name="pages" as="xs:string"/>
  |   <p:option name="compression" values="('none', 'default')" select="'default'"/>
  |</p:declare-step>

Description

This step extracts pages from a PDF. It can also duplicate and re-order pages.

The pages option specifies the pages to extract (to be included) in the result PDF. Individual page numbers, and page ranges, may be specified. For example, “1,3,10-12” would create a new PDF containing pages 1, 3, 10, 11, and 12 from the original PDF. Pages can be reordered; “2,1” would create a PDF containing page 2 of the original PDF followed by page 1 (and no other pages). Pages can also be duplicated; “1,1” would create a PDF containing two pages, both copies of page 1 from the original PDF.

If a sequential subset of pages is selected, the resulting PDF is created by simply deleting the other pages. This always results in a smaller PDF file. Reordering or duplicating pages requires making copies of each page. This can result in a larger PDF result, even though it’s a subset of the pages, because global resources in the original PDF (images, fonts, etc.) may be copied into each individual page copy.

In principle, it’s possible to optimize the resulting PDF. In practice, it’s not clear how to do it reliably.

Document properties

No document properties are preserved.

Additional examples

The XML Calabash test suite contains examples of the cx:pdf-extract step.

Prev	Up	Next
cx:pdf-encrypt	Home	cx:pdf-form